Example 2: Best-Subset Regression

This example illustrates model building in GRM using best-subset regression. The data for this example is the same data used in Example 1 and is available in the Tomatoes.sta data file. The design is a 5-way mixed-level fractional-factorial design with both 2- and 3-level categorical predictor variables. All main and 2-way interaction effects are estimable for this design (see the GLM module Advanced Example: Type V Sums of Squares). A description of the variables in the data set can be found in the Experimental Design module example Designing and Analyzing a 23, 32 Experiment. Refer also to the Mixed 2 and 3 Level Designs topic for additional details concerning these types of designs.

Specifying the Analysis. Open the Tomatoes.sta data file and start General Regression Models:

Ribbon bar. Select the Home tab. In the File group, click the Open arrow and select Open Examples to display the Open a STATISTICA Data File dialog box. Open the data file, which is located in the Datasets folder. Then, select the Statistics tab. In the Advanced/Multivariate group, click Advanced Models and from the menu, select General Regression to display the General Regression Models Startup Panel.

Classic menus. From the File menu, select Open Examples to display the Open a STATISTICA Data File dialog box. Open the data file, which is located in the Datasets folder. Then, from the Statistics - Advanced Linear/Nonlinear Models submenu, select General Regression Models to display the General Regression Models Startup Panel.

Select Factorial ANOVA as the Type of analysis and Quick specs dialog as the Specification method. Then, click the OK button to display the GRM Factorial ANOVA Quick Specs dialog box.

On the Quick tab, click the Variables button to display the standard variable selection dialog box. Select Pounds in the Dependent variable list. Select Soil Condition, Potsize, Variety, Production Method, and Location as the Categorical predictor (factors), and then click the OK button.

For this example, we will force the main effects into the model and limit the search to the best submodel, including additional two-way interactions, if any such submodel is better than the main effects only submodel. Specify a factorial design to degree 2 so that the whole model includes all main and 2-way interaction effects. To do this, click the Between effects button to display the GLM Between Effects dialog box. Select the Use factorial design to specified degree option button, enter 2 in the degree field, and then click the OK button.

In the GRM Factorial ANOVA dialog box, select the Options tab. Select the Best subsets option button. Enter 5 effects to be forced into the model (i.e., the 5 main effects) in the Effects to force field. Specify a search through all submodels in a range of sizes: start with submodels of 6 effects (enter 6 in the Start field) and stop with submodels of 10 effects (enter 10 in the Stop field). Finally, we will use the default R squared option button as the best-subset criterion, and the default options for the remainder of the specifications for the analysis; therefore, click the OK button to display the GRM Results dialog box.

If you want to run this example using GRM Syntax, you can run the following syntax program from the GRM Analysis Syntax Editor (see Methods for specifying designs).

The syntax program for the analysis is:

GRM;

Reviewing Results. On the GRM Results dialog box - Quick tab, in the Model building results group box, click the Summary of best subset regression button. The (abbreviated) Summary of best subsets spreadsheet, showing the R square and the standardized regression coefficients for each of the (default 10) best submodels of each subset size, is shown below.

Using the default R squared option button as the best-subset criterion, the best-subset will always be a submodel of the largest subset size that was searched. For this example, there were 10 effects in the "best" subset, but note in the ANOVA table for the final model that the Variety*Production Method 2-way interaction effect did not approach significance, p > .25. (Click the All effects button to view the Univariate Tests of Significance for Pounds spreadsheet.)

If you modify this analysis by specifying Mallow's Cp as the best-subset criterion on the Options tab of the GRM Factorial ANOVA Quick Specs dialog box, you will find that the best subset has the same set of nine effects (with all p's < .10) as was found using stepwise regression in Example 1. This 9-effect subset is listed as subset number 8 in the Summary of best subsets spreadsheet shown above.

See also GRM - Index.