 GLM Hypothesis Testing - Six Types of Sums of Squares

When there are categorical predictors in the model, arranged in a factorial ANOVA design, then one is typically interested in the main effects for and interaction effects between the categorical predictors. However, when the design is not balanced (has unequal cell n's, and consequently, the coded effects for the categorical factors are usually correlated), or when there are missing cells in a full factorial ANOVA design, then there is ambiguity regarding the specific comparisons between the (population, or least-squares) cell means that constitute the main effects and interactions of interest. These issues are discussed in great detail in Milliken and Johnson (1986), and if you routinely analyze incomplete factorial designs, you should consult their discussion of various problems and approaches to solving them.

STATISTICA GLM implements the widely used methods that are commonly labeled Type I, II, III, and IV sums of squares (see Goodnight, 1980). In addition, we also offer different methods for testing effects in incomplete designs, that are widely used in other areas (and traditions) of research.

Type V sums of squares. Specifically, we propose the term Type V sums of squares to denote the approach that is widely used in industrial experimentation, to analyze fractional factorial designs; these types of designs are discussed in detail in the 2(k-p) Fractional Factorial Designs section of the Introductory Overview to the Experimental Design module. In effect, for those effects for which tests are performed all population marginal means (least squares means) are estimable.

Type VI sums of squares. Second, in keeping with the Type I labeling convention, we propose the term Type VI Sums of Squares to denote the approach that is often used in programs that only implement the sigma-restricted model (which is not well suited for certain types of designs; GLM offers the user a choice between the sigma-restricted and overparameterized models). This approach is identical to what is described as the effective hypothesis method in Hocking (1996).

Balanced designs. Most between designs can be analyzed much more efficiently, when they are balanced, i.e., when all cells in the ANOVA design have equal n, when there are no missing cells in the design, and, if nesting is present, when the nesting is balanced so that equal numbers of levels of the factors that are nested appear in the levels of the factor(s) that they are nested in. In that case, the X'X matrix (where X stands for the design matrix) is a diagonal matrix, and many of the computations necessary to compute the ANOVA results (such as matrix inversion) are greatly simplified. STATISTICA GLM contains an option to "instruct" the program that the design is balanced, and that the more efficient computational methods can be used (option on the(Startup Panel), or option used in conjunction with keyword using GLM syntax). Even very large designs with effects with degrees of freedom in the hundreds can thus be analyzed in mere seconds, while the general computational procedures (that do not assume a balanced design) may take several minutes to accomplish the same.

Whole Model Tests

Partitioning of Sums of Squares

Six Types of Sums of Squares

Contained Effects

Error Terms for Tests

Lack-of-Fit Tests Using Pure Error

Testing Specific Hypotheses

Estimability of Hypotheses

Testing Hypotheses for Repeated Measures and Dependent Variables