Correspondence Analysis - Computational Details

The notation used in this section follows closely that used by Greenacre (1984). Also, refer to Greenacre (1984) for a detailed discussion of the computations involved.

Notation.  The computations are based on the following matrices:

P

is the matrix of relative frequencies, i.e., each element of P is computed as the respective frequency from the input table, divided by the grand total of all values.

r

is the vector of row totals of P.

c

is the vector of column totals of P.

Dr

is a diagonal matrix, the diagonal elements of Dr are equal to the row totals of P.

Dc

is a diagonal matrix, the diagonal elements of Dc are equal to the column totals of P.

Singular value decomposition. The computation of the row and column coordinates is based on the generalized singular value decomposition of P, as:

P = A DuB'

so that

A inverse(Dr)A = B' inverse(Dc)B = I

where A is the matrix of the left-side generalized singular vectors, B is the matrix of the right-side generalized singular vectors, Du is a diagonal matrix with the diagonal elements equal to the generalized singular values, and I stands for the identity matrix (a diagonal matrix with 1's in the diagonal).

Coordinates for row and column points. The computation of the coordinates for row and column points depends on the option button you select in the Standardization of Coordinates group box on the Options tab of the Correspondence Analysis Results dialog:

Row & column profiles. When you select the Row & column profiles option button, the row coordinates are computed based on the row profile matrix R =inverse(Dr)P, and the column coordinates are computed based on the column profile matrix computed analogously. Specifically, the row coordinates are computed as F = inverse(Dr)ADu, and the column coordinates as G = inverse(Dc)BDu. This option is appropriate when you are interested in interpreting both the distances between row points, and the distances between column points (the distances in both coordinate systems for row points and column points are Chi-square distances). However, note that, as discussed in the Introductory Overview, distances between column and row points are not meaningful.

Canonical standardization. When you select the Canonical standardization option button, the row coordinates are computed as F = inverse(Dr)A(Du)½, and the column coordinates as G = inverse(Dc)B(Du)½. For details concerning this standardization, see Gifi (1981).

Row profiles (interpret row dist.). When you select the Row profiles (interpret row dist.) option button,  the row coordinates are computed based on the row profile matrix R = inverse(Dr)P. Specifically, the (principal) row coordinates are computed as F = inverse(Dr)ADu, and the standard column coordinates as G = inverse(Dc)B. This option is appropriate when you are interested in interpreting the distances between row points; the column coordinates should not be interpreted.

Column profiles (interpret col. dist.). When you select the Column profiles (interpret col. dist) option button, or when reviewing the results for column points in multiple correspondence analysis, the column coordinates are computed based on the column profile matrix. Specifically, the (principal) column coordinates are computed as F = inverse(Dc)BDu, and the standard row coordinates as G = inverse(Dr)A. This option is appropriate when you are interested in interpreting the distances between column points; the row coordinates should not be interpreted.

"Model" equation. When using the default method of standardization, the following "model" on P in k dimensions shows how the relative frequencies are approximated:

P » rc' + DrF inverse(Du)G'Dc

In this formula F and G stand for the row and column coordinates, respectively.

Computation of quality and inertia. Note that the choice of the standardization method does not affect the computation of the quality and inertia values reported in the spreadsheet that is displayed when you click the Row and column coordinates button on the Advanced tab of the Correspondence Analysis Results dialog. Those values are always computed based on the Row and column profiles standardization.

Specifically, define the diag(x) operator as setting the elements of vector x into the diagonal of a diagonal matrix; define the square(X) operator as squaring each element in matrix or vector X; then the partial contributions of the row and column points to the total inertia are computed as inverse(Dr)square(A) and inverse(Dc)square(B), respectively.

The quality (Cosine²) for the individual dimensions is computed as diag(inverse(square(ADu)1)) square(ADu) and diag(inverse(square(BDu)1)) square(BDu) for the row and column points, respectively, where 1 stands for a column vector with all elements equal to 1.

The inertia for the row and column points is computed as (1/t) inverse(Dr) square(ADu)1 and (1/t) inverse(Dc) square(BDu)1, respectively, where t stands for the total inertia.

Supplementary points, simple correspondence analysis. The computation of the coordinates for supplementary row and column points depends on the option button you select in the Standardization of Coordinates group box on the Options tab of the Correspondence Analysis Results dialog. Let Rs and Cs be the matrices of relative row or column frequencies for the supplementary rows and columns, respectively. The supplementary row and column frequencies are then computed as follows:

Row & column profiles. When you select the Row & column profiles option button, the supplementary row and column coordinates are computed as Rs inverse(Dc)B and Cs inverse(Dr) A, respectively.

Canonical standardization. When you select the Canonical standardization option button, the supplementary row and column coordinates are computed as Rs inverse(Dc)B(Du)½ and Cs inverse(Dr) A(Du)½, respectively.

Row profiles (interpret row dist.). When you select the Row profiles (interpret row dist.) option button, the supplementary row and column coordinates are computed as Rs inverse(Dc)B and Cs inverse(Dr) A inverse(Du), respectively.

Column profiles (interpret row dist.). When this you select the Column profiles (interpret row dist). option button, the supplementary row and column coordinates are computed as Rs inverse(Dc) B inverse(Du) and Cs inverse(Dr) A, respectively.

Supplementary points, multiple correspondence analysis.  In multiple correspondence analysis, supplementary column coordinates are computed as Cs inverse(Dr) A inverse(Du).