Example 7: Weighted Least Squares

Most regression models can be estimated via least squares methods, that is, by using as the loss function in the estimation procedure the sum of squared deviations of the observed values from the predicted values. However, as discussed in the Introductory Overviews, there are instances where weighting of the squared residuals might be in order.

For example, the computation of the standard errors of regression weights in linear regression rests on the assumption that the residuals are distributed evenly around the regression line over the entire range of the independent variables. When this assumption is violated, one should use weighted least squares instead.

For example, suppose a construction company wants to estimate the relationship between the size of a bid and the cost of preparing the bid. Intuitively, it makes sense to assume that the larger the project, the greater will be the residual variability of the cost about the estimated regression line.

Shown below is a scatterplot of a data set reported in Neter, Wasserman, and Kutner (1985, page 169; see the data file Bid_prep.sta, below). Note that the size of the bid (on the horizontal axis) is scaled in terms of millions of dollars, and the preparation cost is scaled in terms of thousands of dollars. As you can see, the variability of the residuals about the regression line tends to be larger for larger bids.

Without going into details (see Neter et al., 1985, page 167), intuitively, it makes some sense to weight the residuals by the inverse of the squared x values; what this would accomplish is that about an equal emphasis is placed on the larger and the smaller bids in the estimation, and the resultant estimates may become more stable. This estimation method is also called weighted least squares estimation.

Specifying the equation and loss function. Now, specify the weighted least squares. First, open the Bid_prep.sta data file by selecting Open Examples from the File menu (classic toolbar) or by selecting Open Examples from the Open menu on the Home tab (ribbon bar); it is in the Datasets folder.

Then select Nonlinear Estimation from the Statistics - Advanced Linear/Nonlinear Models menu to display the Nonlinear Estimation Startup Panel. Next, select the User-specified regression, custom loss function option in the Startup Panel, then click the OK button to display the User-Specified Regression, Custom Loss dialog. Finally, click the Function to be estimated & loss function button to display the Estimated function and loss function dialog. Here, specify the regular linear regression equation (bid_cost = intercpt + slope*bid_size) in the Estimated function box and ((OBS-PRED)**2) * (1/bid_size**2) in the Loss function box.

The first part of the loss function (i.e., (OBS-PRED)**2) specifies standard least squares. However, for each case, the loss is weighted by the inverse of the squared bid size (i.e., multiplied by (1/bid_size**2)).

Estimation and results. Now, click the OK button on this dialog and again in the User-Specified Regression, Custom Loss dialog to display the Model Estimation dialog. Select the Asymptotic standard errors check box on the Advanced tab. Click the OK button to display the Results dialog. Here, click the Summary: Parameters & standard errors button to display a spreadsheet containing the parameter estimates and their standard errors.

The approximate standard errors for the intercept and slope parameters are .965 and .404, respectively. If you re-analyze these data with the standard least squares loss function (or via the Multiple Regression module), you would obtain standard errors of 3.252 and .5285, respectively. Thus, you can conclude that the parameter estimates based on the weighted least squares estimation are more stable (less subject to random sampling variation).

Weighted least squares via STATISTICA Multiple Regression or General Regression Models (GRM). You can compute weighted least squares estimates for linear models via Multiple Regression, General Regression Models (GRM), or General Linear Models (GLM) as well. To replicate the results from this example, first add a variable to Bid_prep.sta and compute the values of that new variable (e.g., called Weight) as 1/bid_size**2; that is, compute the same weight that you specified to modify the ordinary least squares Loss function in the Estimated function and loss function dialog shown earlier. Then specify this new variable as the Weight variable for the analysis, and make sure to select the Weighted moments check box and the DF = N -1 option button on the respective Startup Panels (e.g., see the Multiple Linear Regression Startup Panel). Because the Multiple Regression, GRM, or GLM modules use matrix algebra expressions (e.g., see Neter, Wassermann, and Kutner, 1985) to estimate the weighted least squares parameters, rather than the iterative procedure used in this example, for large data files these programs are much faster and more efficient.