General Best-Subset and Stepwise Regression

General regression models; builds a linear model to predict continuous dependent variables. The parameters in Statistica allow full access to the GRM syntax for specifying models and for controlling the parameters for stepwise and best-subset selection of predictor effects (for categorical and continuous predictor variables). Default results include the ANOVA/ANCOVA (MANOVA/MANCOVA) table; set the Level of detail parameter to All results to produce tables of means and other statistics. Residual and predicted values can be computed on request.

General

Detail of computed results reported. Specifies the level of computed results reported. If All results is requested, Statistica will produce all univariate results (for multivariate designs), descriptive statistics, details about the design terms, the whole-model R, regression coefficients, and the least-squares means for all effects. Residual and predicted statistics (for observations) can be requested as options.

Analysis syntax. Analysis syntax string for general regression models. You can specify here the complete syntax, as, for example, copied from a Statistica analysis. Set this string to empty, or just GRM; to create the syntax from the specific options specified below.

Design. Required; specify the design for the between group design (categorical and continuous predictors); by default (if no design is specified) a full factorial design will be constructed for categorical predictors, and continuous predictor main effects are evaluated.

Use the syntax:
DESIGN = Design specifications

Example 1.
DESIGN = GROUP | GENDER | TIME | PAID; {makes a full factorial design}

Example 2.
DESIGN = SEQUENCE + PERSON(SEQUENCE) + TREATMNT + SEQUENCE*TREATMNT;

Example 3.
DESIGN = MULLET | SHEEPSHD | CROAKER @2; {Makes factorial design to degree 2}

Example 4.
DESIGN = TEMPERAT | MULLET | SHEEPSHD | CROAKER - TEMPERAT; {Removes main effect for TEMPERAT from factorial design}

Example 5.
DESIGN = BLOCK + DEGREES + DEGREES*DEGREES + TIME + TIME*TIME + TIME*DEGREES;

Intercept. Specifies whether the intercept (constant) is to be included in the model (i.e., a parameter is to be estimated for the intercept); the default is INTERCEPT=INCLUDE.

Model building method. Specifies a model building method.

Contrast coefficients. Specifies contrasts for least squares means; consult the Electronic Manual for syntax details.

Estimate (custom hypotheses). Optional; specify the coefficients that are to be used in the linear combination of parameter estimates for the custom hypothesis; multiple ESTIMATE specifications can appear in the same analysis. Note that tests of linear combinations of parameter estimates can also be requested from the Results dialog, where a convenient and efficient user interface is provided for specifying the coefficients.

Sweep delta 1.E-. Specifies the negative exponent for a base-10 constant Delta (delta = 10^-sdelta); the default value is 7. Delta is used (1) in sweeping, to detect redundant columns in the design matrix, and (2) for evaluating the estimability of hypotheses; specifically a value of 2*delta is used for the estimability check.

Inverse delta 1.E-. Specifies the negative exponent for a base-10 constant Delta (delta = 10^-idelta); the default value is 12. Delta is used to check for matrix singularity in matrix inversion calculations.

Generates data source, if N for input less than. Generates a data source for further analyses with other Data Miner nodes if the input data source has fewer than k observations, as specified in this edit field; note that parameter k (number of observations) will be evaluated against the number of observations in the input data source, not the number of valid or selected observations.

Parameters for Stepwise Selection

Stepwise selection criterion. Specifies the criterion to use for stepwise selection of predictors. Note that the F statistic is only available for analysis problems with continuous (single degree of freedom) predictor variables and a single dependent variable.

p to enter. Specifies p-to-enter for stepwise selection of predictors.

p to remove. Specifies p-to-remove for stepwise selection of predictors.

F to enter. Specifies F-to-enter for stepwise selection of predictors (available only for models with continuous predictor variables and a single dependent variable).

F to remove. Specifies F-to-remove, for stepwise selection of predictors (available only for models with continuous predictor variables and a single dependent variable).

Maximum number of steps. Specifies maximum number of steps for stepwise selection of variables.

Parameters for Best-Subset Selection

Best subsets measure. Specifies the selection criterion for best subset selection of predictors. These options are only available (meaningful) for analysis problems with a single dependent variable. In designs with multiple dependent variables, the selection of the best subset is based on the p for the multivariate test (Wilks' Lambda).

Start for best subsets. Specifies the smallest number of predictors to be included in the model chosen via best subset selection, i.e., the start of the search for the best subset of predictors.

Stop for best subsets. Specifies the maximum number of predictors to be included in the model chosen via best subset selection.

Number of subsets to display. Specifies the number of subsets to display in the results; Statistica will keep a log of the best k predictor models of any given size, using k as specified by this parameter.

Number of variables to force. Specifies the number of predictors to force into the model, i.e., to select into all models considered during the best-subset selection of predictors. Statistica will force the first k predictors in the list of continuous predictors into the model, with k as specified here by you.

Selected Results

Least square means. Creates the expected marginal means, given the current model; either all marginal means tables can be computed, or only the means for the highest-order effect of the factorial design.

Post Hoc Tests. Performs post-hoc comparisons between the means in the design.

Lack of fit. Requests the computation of pure error for testing the lack-of-fit hypothesis.

Tests homogeneity of variances. Tests the homogeneity of variances/covariances assumption. One of the assumptions of univariate ANOVA is that the variances are equal (homogeneous) across the cells of the between-groups design. In the multivariate case (MANOVA), this assumption applies to the variance/covariance matrix of dependent variables (and covariates).

Plots of means vs. std. dev. Plots the (unweighted) marginal means (see also the Means tab) for the selected Variables against the standard deviations.

Residual analysis. In addition to the predicted, observed, and residual values, Statistica will compute the (default) 95% Prediction intervals and 95% Confidence limits, the Standardized predicted and Standardized residual score, the Leverage values, the Deleted residual and Studentized deleted residual scores, Mahalanobis and Cook distance scores, the DFFITS statistic, and the Standardized DFFITS statistic.

Normal probability plot. Normal probability plot of residuals.

Deployment. Deployment is available if the Statistica installation is licensed for this feature.

Generates C/C++ code. Generates C/C++ code for deployment of predictive model (for a single dependent variable only).

Generates SVB code. Generates Statistica Visual Basic code for deployment of predictive model (for a single dependent variable only).

Generates PMML code. Generates PMML (Predictive Models Markup Language) code for deployment of predictive model (for a single dependent variable only). This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.

Saves C/C++ code. Save C/C++ code for deployment of predictive model (for a single dependent variable only).

File name for C/C code. Specify the name and location of the file where to save the (C/C++) deployment code information.

Saves SVB code. Save Statistica Visual Basic code for deployment of predictive model (for a single dependent variable only).

File name for SVB code. Specify the name and location of the file where to save the (SVB/VB) deployment code information.

Saves PMML code. Saves PMML (Predictive Models Markup Language) code for deployment of predictive model (for a single dependent variable only). This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.

File name for PMML (XML) code. Specify the name and location of the file where to save the (PMML/XML) deployment code information.