Special Topics
Example 4 - Box-Cox Transformation of a Dependent Variable
Overview. This
example describes the procedures involved in diagnosing the need for transformation
of the dependent variable and determining the appropriate value of lambda for the Box-Cox transformation
of a dependent variable. By using the Box-Cox transformation option available
on the Box-Cox
tab, you can easily determine the transformation from the family of
power transformations that minimizes the error variability (the unpredicted
variation) in the dependent variable.
Specifying the variables
and design. Box and Draper (1987, page 205) report a study of the
behavior of worsted yarn under cycles of repeated loading. (The same study
is used as the basis for Example
3: Analyzing a 3**3 Full Factorial and Special
Topics Example 3 - Residuals Analysis.) The dependent variable of
interest is the number of cycles to failure. Because of large variability
in that variable, a transformation of the dependent variable values were
also considered. The data are contained in the data file Textile2.sta.
The three factors included in the study were:
Factor |
Low |
Med |
High |
Length of specimen (mm) |
250 |
300 |
350 |
Amplitude of load cycle (mm) |
8 |
9 |
10 |
Load (g) |
40 |
45 |
50 |
In this example, we will first look at the diagnostic Box-Cox transformation
graph to determine the need for transformation of the dependent variable.
To begin the analysis, open the example data file Textile2.sta
and start the Experimental Design
module.
Ribbon bar.
Select the Home tab. In
the File group, click the Open arrow and from the menu, select Open Examples to display the Open a Statistica Data File dialog
box. Double-click the Datasets
folder, and open the Textile2.sta data
set. Then, select
the Statistics
tab, and in the Industrial Statistics group, click
DOE
to display the Design
& Analysis of Experiments Startup
Panel.
Classic
menus. On the File menu, select Open Examples to
display the Open a Statistica Data File
dialog box. The Textile2.sta data
file is located in the Datasets
folder. Then, on the Statistics - Industrial Statistics
& Six Sigma submenu, select Experimental
Design (DOE) to display the Design
& Analysis of Experiments Startup Panel.
On the Quick tab, double-click
3**(k-p) and Box-Behnken designs
to display the Design
and Analysis of Experiments with Three-Level factors dialog
box.
Select the Analyze
design tab. Click the Variables
button, and in the variable selection dialog box select Cycles
and Log_Cycl as the Dependent
variables, and select the three variables Length,
Amplitud, and Load
as the Indep (factors). Click
OK in the variable selection
dialog box.
Click OK in the Design
& Analysis of Experiments with Three-Level Factors dialog box
to display the Analysis
of an Experiment with Three-Level Factors dialog box.
Reviewing results for
the untransformed dependent variable. Box and Draper (1987) fitted
the original (unlogged) dependent variable to the first degree polynomial
model.
To reproduce the model coefficients reported by Box and Draper (1987,
p. 214), select Cycles in the
Variable drop-down list (located
toward the top of the dialog box, under the Summary box).
Select the Model
tab, and select the No interactions
option button and select the Ignore
some effects check box. Click OK
in the note.
In the Customized (Pooled) Error Term
dialog box, select the quadratic effects (the effects with a Q next to
them) to pool them into the error term. Click OK.
On either the Quick
tab or the
ANOVA/Effects tab, click the Summary:
Effect estimates button to display the Effect
estimates spreadsheet as we did in Special
Topics Example 3 - Residuals Analysis.
Shown above are the 4 right-most columns of the spreadsheet, with the
coefficients for the recoded (-1,0,1) factor values. The column of t-values (not shown in the illustration
above) show that all three linear effects are highly significant.
Now select the Box-Cox
tab, and click the Box-Cox Transformation
button. The Box-Cox transformation
graph and two spreadsheets will be produced.
The graph shows the Residual sum of squares, given the model, as a function
of different computed estimates of lambda,
and showing the maximum likelihood estimate of lambda,
which is the estimated value of lambda
for which the Residual sum of squares is a minimum. The graph for this
example shows that the minimum Residual sum of squares of 243413.142,
occurs at a value of lambda of
-.0593.
The accompanying Box-Cox Transformation
spreadsheet lists the Observed values and Residuals for the dependent
variable, and corresponding Transformed observed values and Transformed
residuals, using the Box-Cox transformation with the maximum likelihood
estimate of lambda.
The Final statistics spreadsheet
lists the maximum likelihood estimate of Lambda,
the SSE(1), the maximum likelihood
Chi-square(1), and its associated
probability, p.
The SSE(1) is the Residual
sum of squares, given the model and using a single parameter, lambda, to transform the dependent
variable, and the Chi-square(1)
is the appropriate statistic for testing the reduction in the Residual
sum of squares produced by the Box-Cox
transformation with the maximum likelihood estimate of lambda
(see Maddala, 1977).
The test of significance of the Chi-square(1)
value therefore is a test of the need for transformation of the dependent
variable. For this example, the Chi-square
value of 84.08554 with 1 degree
of freedom is highly significant, indicating that the Residual sum of
squares is significantly reduced by using the Box-Cox
transformation with a value of lambda
of -.0593.
Reviewing results for
the transformed dependent variable. In practice, it is not important
that we use the exact estimated value of lambda
for transforming the dependent variable. Rather, as a general rule, we
should consider the following transformations:
Approximate lambda |
Suggested transformation of y |
-1 |
Reciprocal |
-0.5 |
Reciprocal square root |
0 |
Natural logarithm |
0.5 |
Square root |
1 |
None |
The value of lambda of -.0593 is close to 0, suggesting the
appropriateness of a logarithmic transformation.
The Textile2.sta data file
contains the variable called Log_Cycl,
which is a logarithmic transformation of the Cycle
variable.
Select Log_Cycl in the Variable drop-down list, select the
Box-Cox
tab, and click the Box-Cox Transformation
button.
In the Final statistics spreadsheet,
the Chi-square value of .941770 with 1 degree of freedom is
insignificant, indicating that the Residual
sum of squares is not significantly reduced by using the Box-Cox transformation of the Log_Cycl variable with a value of lambda of .628228.
The logarithmic transformation of the Cycle
dependent variable appears to be adequate.
Summary. The
Box-Cox Transformation is a widely-used
variance stabilizing transformation of a dependent variable. It also has
the desirable property of reducing or eliminating correlation between
the group means and standard deviations, which violates one of the assumptions
of the ANOVA model.
For additional information regarding the power family of transformations
for a dependent variable, see Box and Cox (1964), Box and Draper (1987),
and Maddala (1977).
For an overview and computational details, see the Special
Topic in Experimental Design - Box-Cox transformation of a dependent variable.
Descriptions of procedures for examining residuals can be found in the
Special Topics Example
3 - Residuals Analysis.