Discriminant Function Analysis Introductory Overview - Stepwise Discriminant Analysis

Probably the most common application of discriminant function analysis is to include many measures in the study, in order to determine the ones that discriminate between groups. For example, an educational researcher interested in predicting high school graduates' choices for further education would probably include as many measures of personality, achievement motivation, academic performance, etc. as possible in order to learn which one(s) offer the best prediction.

Model. Put another way, we want to build a "model" of how we can best predict to which group a case belongs. In the following discussion we will use the term "in the model" in order to refer to variables that are included in the prediction of group membership, and we will refer to variables as being "not in the model" if they are not included.

Forward stepwise analysis. In stepwise discriminant function analysis, STATISTICA "builds" a model of discrimination step-by-step. Specifically, at each step STATISTICA reviews all variables and evaluate which one will contribute most to the discrimination between groups. That variable will then be included in the model, and STATISTICA proceeds to the next step.

Backward stepwise analysis. You can also step backwards; in that case STATISTICA first includes all variables in the model and then, at each step, eliminates the variable that contributes least to the prediction of group membership. Thus, as the result of a successful discriminant function analysis, one would only keep the "important" variables in the model, that is, those variables that contribute the most to the discrimination between groups.

F to enter, F to remove. The stepwise procedure is "guided" by the respective F to enter and F to remove values. The F value for a variable indicates its statistical significance in the discrimination between groups, that is, it is a measure of the extent to which a variable makes a unique contribution to the prediction of group membership. If you are familiar with stepwise multiple regression procedures (see Multiple Regression), then you may interpret the F to enter/remove values in the same way as in stepwise regression.

In general, STATISTICA continues to choose variables to be included in the model, as long as the respective F values for those variables are larger than the user-specified F to enter; STATISTICA excludes (removes) variables from the model if their significance is less than the user-specified F to remove.

Capitalizing on chance. A common misinterpretation of the results of stepwise discriminant analysis is to take statistical significance levels at face value. When STATISTICA decides which variable to include or exclude in the next step of the analysis, it actually computes the significance of the contribution of each variable under consideration. Therefore, by nature, the stepwise procedures will capitalize on chance because they "pick and choose" the variables to be included in the model so as to yield maximum discrimination. Thus, when using the stepwise approach you should be aware that the significance levels do not reflect the true alpha error rate, that is, the probability of erroneously rejecting H0 (the null hypothesis that there is no discrimination between groups).

See also, Exploratory Data Analysis and Data Mining Techniques.