Overview. This example concerns the procedures in performing Residuals analysis. Residuals are the deviations of the observed values on the dependent variable from the predicted values, given the current model. The ANOVA models used in analyzing responses on the dependent variable in most of the programs in the Experimental Design module make certain assumptions about the distributions of residual (but not predicted) values on the dependent variable. These assumptions can be summarized by saying that the ANOVA model assumes normality, linearity, heteroscedasticity, and independence of residuals. These kinds of properties of the residuals for a dependent variable can be inspected using the options and selections available in the Experimental Design module.

Specifying the variables and design. Box and Draper (1987, page 205) report a study of the behavior of worsted yarn under cycles of repeated loading. (The same study is used as the basis for Example 3: Analyzing a 3**3 Full Factorial and Special Topics Example 4 - Box-Cox Transformation of a Dependent Variable.) The dependent variable of interest is the number of cycles to failure. Because of large variability in that variable, the log10 transformed dependent variable values were also considered. The data are contained in the data file Textile2.sta. The three factors included in the study are:

Factor |
Low |
Med |
High |

Length of specimen (mm) |
250 |
300 |
350 |

Amplitude of load cycle (mm) |
8 |
9 |
10 |

Load (g) |
40 |
45 |
50 |

In this example, we will first look at the untransformed dependent variable values to illustrate a residual analysis indicating a faulty model and/or dependent variable. To begin the analysis, open the example data file Textile2.sta and start Experimental Design (DOE).

Ribbon bar.
Select the Home tab. In
the File group, click the Open arrow and from the menu, select Open Examples to display the Open a Statistica Data File dialog
box. Double-click the Datasets
folder, and open the Textile2.sta data
set. Then, select
the Statistics
tab, and in the Industrial Statistics group, click
DOE
to display the __ Design
& Analysis of Experiments__ Startup
Panel.

Classic
menus. On the __ File__ menu, select

In the Startup Panel, double-click 3**(k-p) and Box-Behnken designs to display the Design and Analysis of Experiments with Three-Level factors dialog box.

Select the Analyze design tab. Click the Variables button, and select Cycles and Log_Cycl as the Dependent variables; Length, Amplitud, and Load as the Indep (factors); and click OK.

Click OK in the Design and Analysis of Experiments with Three-Level factors dialog box to display the Analysis of an Experiment with Three-Level Factors dialog box.

Reviewing results for the untransformed dependent variable. Box and Draper (1987) began their residuals analysis by fitting the original (unlogged) dependent variable to the first degree polynomial equation. To reproduce the model coefficients reported by Box and Draper (1987, p. 214), select Cycles in the Variable drop-down list toward the top of the dialog box under the Summary box.

On the Model
tab, select the No interactions
option button, select the Ignore some
effects check box, and in the Customized
(Pooled) Error Term dialog box, highlight

Click OK in the Customized (Pooled) Error Term dialog box.

On either the Quick tab or the ANOVA/Effects tab, click the Summary: Effect estimates button to produce the Effect Estimates spreadsheet.

Shown above are the last four columns of the spreadsheet, with the coefficients
for the recoded (-1,0,1) factor values. The column of t-values
(not shown in the illustration above) shows that all three linear effects
are highly significant.

Residual Plots and Analyses. Select the Residual plots tab, and click the Histogram of residuals button. The histogram shows that the residuals are at least somewhat positively skewed.

Note that you can use the Interactive Graphics Controls at the bottom of the graph window to adjust the transparency of the plot areas.

Now, click the Histogram of predicted
values button. Here we see a hint of serious trouble; four of the
predicted values are negative, which does not make physical sense on the
number-of-cycles-to-failure dependent variable, although there is no clear
evidence that the predicted values are negatively skewed.

Next, click the Normal plot
button in the Probability plots of residuals
group box. You can see evidence
of non-linearity in the plot of Expected Normal Values and corresponding
probabilities against the Residuals, with very low and very high residuals
falling below and moderate residuals falling above the straight line that
is expected under the ANOVA model assumptions.

The Observed vs. predicted values plot shows the same pattern, and the Detrended normal plot shows the pattern even more clearly. The Predicted vs. residual values plot shows that the relation between the Residuals and Predicted Values clearly is not uniform, and the Observed vs residual values plot shows severe non-uniformity in the relation between the Observed values and Residuals. Fortunately, the uniformity of the Residuals vs. deleted residuals plot and the Residuals vs. case numbers plot provide no evidence of outliers or serial correlation in the observations, respectively, but nevertheless it is clear the residuals do not behave as would be expected if the usual ANOVA assumptions are met.

Perhaps most troubling of all, there is clear evidence that the cell variances on Cycles are strongly related to the cell means. It does take a little bit of ingenuity to make this determination, because there is only one case in each of the 27 cells of the 3 x 3 x 3 design, so the variance in each cell is, strictly speaking, undefined. However, the relations between marginal cell means and corresponding marginal cell standard deviations for pairs of factors collapsing across the third factor can be assessed.

For example, on the Means tab (or the Quick tab), in the under Observed marginal means group box, click the Display button to produce a spreadsheet with the marginal cell means and standard deviations for Length and Amplitude, collapsing across Load (highlight Length and Amplitude in the Compute marginal means for dialog box). If you were to perform a correlation between these marginal means and standard deviations it would be .96, or nearly perfect.

Reviewing results for the transformed dependent variable. Select Log_Cycle in the Variable drop-down list, and repeat the analysis. You will find markedly lower correlations between marginal cell means and standard deviations, and the residuals plots reveal none of the irregularities identified above for the Cycles dependent variable. The log transformation produces markedly better behaved residuals.

One issue that remains, however, is whether it is better to transform
a dependent variable with poorly-behaved residuals or to add higher-order
terms to the prediction model to account for the non-linearity in the
diagnostic plots of the residuals from the first degree polynomial (linear)
model. For this example, Box and Draper (1987) present evidence that the
former alternative is preferable.

Shown below are the last four columns of the Effect
estimates spreadsheet for the
Log_Cycl

Of greatest interest is the R-square
value of .96585, showing that
virtually all of the variability in the transformed dependent variable
is accounted for by the simple, first degree polynomial model. If you
were to redo the analysis using the second degree polynomial model to
fit the untransformed dependent variable Cycles
(by selecting 2-way interactions
(linear x linear) on the Model
tab), the R-square value
of .93788 is actually smaller.
Thus, the greater complexity of a model with added higher order terms
does not pay off in producing greater predictability of the untransformed
dependent variable. Parsimony favors the adoption of the simpler model
with the transformed dependent variable over the more complex model with
the untransformed dependent variable, because in the latter case responses
are explained no better than in the former.

Summary. This example illustrates some of the procedures involved in conducting Residual analysis. Careful inspection of diagnostic plots and statistics can lead to more fruitful analyses and, at least sometimes, more interpretable results. For an additional example of diagnostic procedures for determining the need for transformation of the dependent variable, see Special Topics Example 4 - Box-Cox Transformation of a Dependent Variable.