GLM Introductory Overview - Sigma-Restricted and Overparameterized Model

Unlike the Multiple Regression model, which is usually applied to cases where the X variables are continuous, the general linear model is frequently applied to analyze any ANOVA or MANOVA design with categorical predictor variables, any ANCOVA or MANCOVA design with both categorical and continuous predictor variables, and any multiple or multivariate regression design with continuous predictor variables. To illustrate, Gender is clearly a nominal level variable (anyone who attempts to rank order the sexes on any dimension does so at his or her own peril in today's world). There are two basic methods by which Gender can be coded into one or more (non-offensive) predictor variables, and analyzed using the general linear model.

Sigma-restricted model (coding of categorical predictors). Using the first method, males and females can be assigned any two arbitrary, but distinct values on a single predictor variable. The values on the resulting predictor variable will represent a quantitative contrast between males and females. Typically, the values corresponding to group membership are chosen not arbitrarily but rather to facilitate interpretation of the regression coefficient associated with the predictor variable. In one widely used strategy, cases in the two groups are assigned values of 1 and -1 on the predictor variable, so that if the regression coefficient for the variable is positive, the group coded as 1 on the predictor variable will have a higher predicted value (i.e., a higher group mean) on the dependent variable, and if the regression coefficient is negative, the group coded as -1 on the predictor variable will have a higher predicted value on the dependent variable. An additional advantage is that since each group is coded with a value one unit from zero, this helps in interpreting the magnitude of differences in predicted values between groups, because regression coefficients reflect the units of change in the dependent variable for each unit change in the predictor variable. This coding strategy is aptly called the sigma-restricted parameterization, because the values used to represent group membership (1 and -1) sum to zero.

Note that the sigma-restricted parameterization of categorical predictor variables usually leads to X'X matrices that do not require a generalized inverse for solving the normal equations. Potentially redundant information, such as the characteristics of maleness and femaleness, is literally reduced to full-rank by creating quantitative contrast variables representing differences in characteristics.

As further illustration, consider an example where a model is specified that has 1 factor that contains 3 three levels A, B, and C. Under the sigma-restricted parameterization, the factor would be coded as follows:

Factor

Column A

Column B

A

1

0

B

0

1

C

-1

-1

This parameterization leads to the interpretation that each coefficient estimates the difference between each level and the average of the other 2 levels, i.e., the coefficient for A is the estimate of the difference between level A and the average of levels of B and C.

Overparameterized model (coding of categorical predictors). The second basic method for recoding categorical predictors is the indicator variable approach. In this method, a separate predictor variable is coded for each group identified by a categorical predictor variable. To illustrate, females might be assigned a value of 1 and males a value of 0 on a first predictor variable identifying membership in the female Gender group, and males would then be assigned a value of 1 and females a value of 0 on a second predictor variable identifying membership in the male Gender group. Note that this method of recoding categorical predictor variables will almost always lead to X'X matrices with redundant columns, and thus require a generalized inverse for solving the normal equations. As such, this method is often called the overparameterized model for representing categorical predictor variables, because it results in more columns in the X'X than are necessary for determining the relationships of categorical predictor variables to responses on the dependent variables.

True to its description as general, the general linear model can be used to perform analyses with categorical predictor variables that are coded using either of the two basic methods that have been described.

Other GLM Introductory Overview Topics

Historical Background

A detailed discussion of univariate and multivariate ANOVA techniques can also be found in the Introductory Overview section of the ANOVA/MANOVA module; a discussion of multiple regression methods is provided in the Multiple Regression Overviews.

See also, GLM - Index.