Best-Subset and Stepwise GDA ANCOVA

Best-subset and stepwise discriminant function analysis with categorical factor effects; builds a linear discriminant function model for continuous and categorical predictor variables, using ANCOVA-like designs. By default, only main effects will be evaluated for categorical predictors; you can also construct factorial designs up to a certain degree (e.g., to degree 3, to include all 2-way and 3-way interactions of categorical predictors). Note that the algorithm for stepwise and best subset selection of categorical factor effects ensures that complete (possibly multiple-degrees-of-freedom) effects are moved into and out of the model.

The General Discriminant Analysis module provides functionality that makes this technique a general tool for classification and data mining. However, most - if not all - textbook treatments of discriminant function analysis are limited to simple and stepwise analyses with single degree of freedom continuous predictors. No 'experience' (in the literature) exists regarding issues of robustness and effectiveness of these techniques, when they are generalized in the manner provided in this very powerful module. The use of best-subset methods, in particular when used in conjunction with categorical predictors, should be considered a heuristic search method, rather than a statistical analysis technique.

General

Model building method. Specifies a model building method.

Detail of computed results reported. Detail of computed results; if Minimal level of detail is requested, the output contains Chi-square tests of roots, discriminant (canonical) function coefficients, factor structure coefficients, and classification function coefficients. If All results is requested, Statistica will also report various descriptive statistics and classification summary statistics. Classification statistics for each case can be requested separately as an option.

Construct factorial to degree. Specifies the factorial degree of the design to be tested; Statistica will construct an ANCOVA-like factorial design for all categorical predictors up to the specified degree (i.e., by default up to degree 1, so that the final model will include only main effects for categorical predictors; if you set this parameter to 2, then all two-way interactions will also be included, and so on).

Priors. Set the prior classification probabilities for classifying observations. The default specification is Estimated; use this option to set the prior classification probabilities proportional to the observed group (class) N's; use the Equal option to assign equal probabilities to each group or class specified in the categorical dependent variable.

Case statistics. Creates and reports selected case statistics.

Sweep delta 1.E-. Specifies the negative exponent for a base-10 constant Delta (delta = 10^-sdelta); the default value is 7. Delta is used (1) in sweeping, to detect redundant columns in the design matrix, and (2) for evaluating the estimability of hypotheses; specifically a value of 2*delta is used for the estimability check.

Inverse delta 1.E-. Specifies the negative exponent for a base-10 constant Delta (delta = 10^-idelta); the default value is 12. Delta is used to check for matrix singularity in matrix inversion calculations.

Generates data source, if N for input less than. Generates a data source for further analyses with other Data Miner nodes if the input data source has fewer than k observations, as specified in this edit field; note that parameter k (number of observations) will be evaluated against the number of observations in the input data source, not the number of valid or selected observations.

Parameters for Stepwise Selection

Stepwise selection criterion. Specifies the criterion to use for stepwise selection of predictors. Note that the F statistic is only available for designs that do not include categorical factor effects.

p to enter. Specifies p-to-enter for stepwise selection of predictors.

p to remove. Specifies p-to-remove for stepwise selection of predictors.

F to enter. Specifies F-to-enter for stepwise selection of predictors (available for single continuous dependent variables only). Note that the F statistic is only available for designs that do not include categorical factor effects.

F to remove. Specifies F-to-remove for stepwise selection of predictors (available for single continuous dependent variables only). Note that the F statistic is only available for designs that do not include categorical factor effects.

Maximum number of steps. Specifies the maximum number of steps for stepwise selection of variables.

Parameters for Best-Subset Selection

Best subsets measure. Specifies the selection criterion for best subset selection of predictors. To use cross-validation misclassification rates, a cross-validation variable (learning sample) must be specified.

Start for best subsets. Specifies the smallest number of predictors to be included in the model chosen via best subset selection, i.e., the start of the search for the best subset of predictors.

Stop for best subsets. Specifies the maximum number of predictors to be included in the model chosen via best subset selection.

Number of subsets to display. Specifies the number of subsets to display in the results; Statistica will keep a log of the best k predictor models of any given size, using k as specified by this parameter.

Number of variables to force. Specifies the number of predictors to force into the model, i.e., to select into all models considered during the best-subset selection of predictors. STATISTICA will force the first k predictors in the list of continuous predictors into the model, with k as specified here by you.

Deployment. Deployment is available if the Statistica installation is licensed for this feature.

Generates C/C++ code. Generates C/C++ code for deployment of predictive model.

Generates SVB code. Generates Statistica Visual Basic code for deployment of predictive model.

Generates PMML code. Generates PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.

Saves C/C++ code. Save C/C++ code for deployment of predictive model.

File name for C/C code. Specify the name and location of the file where to save the (C/C++) deployment code information.

Saves SVB code. Save Statistica Visual Basic code for deployment of predictive model.

File name for SVB code. Specify the name and location of the file where to save the (SVB/VB) deployment code information.

Saves PMML code. Saves PMML (Predictive Models Markup Language) code for deployment of predictive model. This code can be used via the Rapid Deployment options to efficiently compute predictions for (score) large data sets.

File name for PMML (XML) code. Specify the name and location of the file where to save the (PMML/XML) deployment code information.