Statistics in Crosstabulations - Pearson Chi-square

The Pearson Chi-square is the most common test for significance of the relationship between categorical variables. This measure is based on the fact that we can compute the expected frequencies in a two-way table (i.e., frequencies that we would expect if there was no relationship between the variables). For example, suppose we ask 20 males and 20 females to choose between two brands of soda pop (brands A and B). If there is no relationship between preference and gender, then we would expect about an equal number of choices of brand A and brand B for each sex. The Chi-square test becomes increasingly significant as the numbers deviate further from this expected pattern; that is, the more this pattern of choices for males and females differs.

The value of the Chi-square and its significance level depends on the overall number of observations and the number of cells in the table. Consistent with the principles discussed in Elementary concepts, relatively small deviations of the relative frequencies across cells from the expected pattern will prove significant if the number of observations is large.

The only assumption underlying the use of the Chi-square (other than random selection of the sample) is that the expected frequencies are not very small. The reason is that the Chi-square inherently tests the underlying probabilities in each cell; and when the expected cell frequencies fall, for example, below 5, those probabilities cannot be estimated with sufficient precision. For further discussion of this issue refer to Everitt (1977), Hays (1988), or Kendall and Stuart (1979).