Input Formats in Correspondence Analysis - Raw Data

If the Raw data (requires tabulation) option button is selected [from the Input group box on either the Correspondence Analysis (CA): Table Specifications Startup Panel - Correspondence Analysis (CA) tab or the Multiple Correspondence Analysis (MCA): Table Specifications Startup Panel - Multiple Correspondence Analysis (MCA) tab], STATISTICA expects as input (categorical) grouping variables with code values uniquely identifying to which category each case belongs. STATISTICA will then tabulate the respective variables to compute the input table. For example, the variables may contain the following codes:

STAFFGRP

SMOKING

Sr.Manag

None

Sr.Manag

Light

Sr.Manag

Medium

Sr.Manag

Heavy

Jr.Manag

None

Jr.Manag

Light

Jr.Manag

Medium

Jr.Manag

Heavy

Sr.Empl

None

Sr.Empl

Light

Sr.Empl

Medium

.......

.......

.......

.......

.......

.......

If you selected variables StaffGrp and Smoking for the analysis, STATISTICA would cross-tabulate those variables and compute the two-way frequency table (see also Basic Statistics for a discussion of crosstabulation tables).

Selection of variables and codes for simple correspondence analysis. To specify a simple correspondence analysis, click the Row and column variable(s) button to display the standard variable selection dialog. If you select one row variable and one column variable, then the analysis will be performed on the two-way table defined by the categories for the two variables. Click the Codes for grouping variables button to display the Select Codes for Coding Variables dialog, in which you enter the codes (numbers or text values) that define the categories for the selected variables. If more than one variable was selected for the list of row or column variables, then all combinations of the categories of the selected variables in one list (e.g., rows) will be crosstabulated against the respective combinations of categories for the variables in the other list (e.g., columns). For example, in the following two-way table, the combinations of categories for the two column variables Age and Survival were tabulated against the combinations of categories for the two row variables Inflammation and Location.

  Age:

under 50

50 to 69

over 69

Survival:

No

Yes

No

Yes

No

Yes

Inflamm.

Location

 

MIN_MAL

TOKYO

9

26

9

20

2

1

MIN_MAL

BOSTON

6

11

8

18

9

15

MIN_MAL

GLAMORGN

16

16

14

27

3

12

MIN_BEGN

TOKYO

7

68

9

46

3

6

MIN_BEGN

BOSTON

7

24

20

58

18

26

MIN_BEGN

GLAMORGN

7

20

12

39

7

11

GRT_MAL

TOKYO

4

25

11

18

1

5

GRT_MAL

BOSTON

6

4

3

10

3

1

GRT_MAL

GLAMORGN

3

8

3

10

3

4

GRT_BEGN

TOKYO

3

9

2

5

0

1

GRT_BEGN

BOSTON

0

0

2

3

0

1

GRT_BEGN

GLAMORGN

0

1

0

4

0

1

In effect, the resulting table is a 4-way table, where the combinations of categories for the row and column variables are arranged to form a two-way table for the correspondence analysis.

Selection of variables and codes for multiple correspondence analysis. To specify a multiple correspondence analysis, click the Variables (Factors in Burt Table) button to display the standard variable selection dialog, in which you select variables for the analysis. The Burt table (see also MCA Introductory Overview) will be computed for the categories of the selected variables. Select Codes for grouping variables to display the Select Codes for Coding Variables dialog, in which you enter the codes (numbers or text values) that define the categories for the selected variables. For example, suppose you selected variables Survival (Yes, No), Age (<50, 50-69, and 69+), and Location (Tokyo, Boston, and Glamorgn) for the analysis. The program would compute the following type of Burt table for the multiple correspondence analysis.

 

Survival

   

Age

  

Location

NO

YES

<50

50-69

69+

TOKYO

BOSTON

GLAMORGN

SURVIVAL:NO

210

0

68

93

49

60

82

68

SURVIVAL:YES

0

554

212

258

84

230

171

153

 

 

 

 

 

AGE:UNDER_50

68

212

 

280

0

0

 

151

58

71

AGE:A_50TO69

93

258

0

351

0

120

122

109

AGE:OVER_69

49

84

0

0

133

19

73

41

 

 

 

 

 

LOCATION:TOKYO

60

230

 

151

120

19

 

290

0

0

LOCATION:BOSTON

82

171

58

122

73

0

253

0

LOCATION:GLAMORGN

68

153

71

109

41

0

0

221

The Burt table has a clearly defined structure. Overall, the data matrix is symmetrical. In the case of 3 categorical variables, the data matrix consists of 3 x 3 = 9 partitions, created by each variable being tabulated against itself, and against the categories of all other variables. Note that the sum of the diagonal elements in each diagonal partition (i.e., where the respective variables are tabulated against themselves) is constant (equal to 764 in this case). Technically, the Burt table is the result of the inner product of an indicator or design matrix; to analyze tables based on indicator matrices that incorporate fuzzy coding schemes, you can specify as input a Burt table directly (select the Frequencies w/out grouping vars option button in the Input group box of the Multiple Correspondence Analysis (MCA): Table Specifications dialog). Refer to MCA - Introductory Overview for additional details.

In addition to the variables defining the table for the analysis, you can designate some variables as Supplementary columns (variables). Note that unlike in simple correspondence analysis, where supplementary columns and rows can be added from the Correspondence Analysis Results - Supplementary points tab, in multiple correspondence analysis it is required that the supplementary columns also define a valid Burt table. Therefore, in this case click the Variables (Factors in Burt table) button to specify all variables for the analysis, and then click the Supplementary columns (variables) button to select the subset of those variables that are to be treated as supplementary columns. The variables selected as supplementary columns will not be used for the computation of eigenvalues and eigenvectors (see Computational Details), but coordinate values will be computed for those columns and reported in the spreadsheet and plots of coordinates.