Multiple Regression  Notes and Technical Information
General. The
Multiple Regression routine consists
of two major parts. The first calculates a correlation matrix (or extracts
a correlation matrix if matrix input is selected) according to the user's
specifications (i.e., missing data, selection conditions, etc.). The second
performs the actual multiple regression analyses.
Calculating Multiple
Regression, Matrix Inversion. All calculations involved
in the actual multiple regression analysis are performed in double precision.
Matrix inversion is accomplished via sweeping
(see Dempster, 1969, p. 62). The regression weights, residual sums of
squares, tolerances, and partial
correlations are also calculated as part of the sweeping operation (see
also Jennrich, 1977).
Statistical Significance
Tests. The standard formulas are used for calculating the
Fvalue associated with the multiple
R, and for the tvalues
associated with the regression coefficients (e.g., see Cooley & Lohnes,
1971; Darlington, 1990; Lindeman, Merenda, and Gold, 1980; Morrison, 1967;
Neter, Wasserman, & Kutner, 1985; Pedhazur, 1973; Stevens, 1986; Younger,
1985).
Residuals. The
standard error of a residual score is computed as the square root of:
[1  1/n  (Xraw  Xmean )*C1*(Xraw  Xmean )']*RMS
where
Xraw 
is the vector of raw data for the independent
variables 
Xmean 
is the vector of means for the independent
variables 
C1 
is the inverse of the matrix of crossproducts
of deviations for the independent variables 
n 
is the number of valid cases 
RMS 
is the residual mean square 
The terms 1/n and Xmean
are dropped if there is no intercept (regression forced through the origin).
The standardized residuals
are obtained by dividing each residual by the square root of the residual
mean square.
The Mahalanobis distance is the distance of a case from the centroid
of all cases in the space defined by the independent variables. It is
computed as:
(n1)*(Xraw  Xmean )*C1*(Xraw  Xmean )'
where
Xraw 
is the vector of raw data for the independent
variables 
Xmean 
is the vector of means for the independent
variables 
C1 
is the inverse of the matrix of crossproducts
of deviations for the independent variables 
n 
is the number of valid cases 
The terms 1/n and Xmean are
dropped if there is no intercept (regression forced through the origin).
Refer to the Multiple
Regression Examples for an example of how Mahalanobis distances can
aid in the detection of outliers.
The deleted residual is the residual which would have been obtained
had the case not been included in the estimation of the regression equation.
It is calculated by dividing the ordinary residual by:
1  (1/n)  (Xraw  Xmean )*C1*(Xraw  Xmean )'
where
Xraw 
is the vector of raw data for the independent
variables 
Xmean 
is the vector of means for the independent
variables 
C1 
is the inverse of the matrix of crossproducts
of deviations for the independent variables 
n 
is the number of valid cases 
The terms 1/n and Xmean are
dropped if there is no intercept (regression forced through the origin).
Refer to the Multiple
Regression Examples for an example of how deleted residuals can aid
in the detection of outliers.
Cook's distance (Cook, 1977) is useful for assessing the changes that
would result in all residuals if the respective case were to be omitted
from the regression analysis. It is defined as:
{Deleted res.2 *[1/n + MD/(n1)]}/[(No.
of vars + 1)*RMS]
where:
MD 
is the Mahalanobis distance 
RMS 
is the residual mean square 
If there is no intercept, n1 is replaced by n, the term 1/n is dropped,
and the term +1 (adding 1 to the number of independent variables) is dropped.
Power Transformations
for Dependent and Independent Variables. Statistical significance
testing in multiple regression is based on the assumption of homogenous
residual variance over the range of the dependent variable. When this
assumption is violated, in some cases an appropriate transformation of
the dependent or independent variables may correct the problem. A class
of power transformations that can be applied to the dependent variable
or the independent variables is:
y' 
= yλ
for λ≠0 

= natural
log(y) for λ=0 
This formulation encompasses the reciprocal transformation (λ=1), the square root transformation (λ=.5), the square transformation (λ=2), and the logarithmic transformation (λ=0). However, note that it is required that all
values of y be greater than 0 (zero). For additional details about these
transformations, refer to Box and Cox (1964), Box and Tidwell (1962),
Gunst, Mason, and Hess (1989), or Snee (1986).