Best-subset and stepwise discriminant function analysis with categorical
factor effects; builds a linear discriminant function model for continuous
and categorical predictor variables, using ANCOVA-like designs. By default,
only main effects will be evaluated for categorical predictors; you can
also construct factorial designs up to a certain degree (e.g., to degree
3, to include all 2-way and 3-way interactions of categorical predictors).
Note that the algorithm for stepwise and best subset selection of categorical
factor effects ensures that complete (possibly multiple-degrees-of-freedom)
effects are moved into and out of the model.

The General Discriminant Analysis module provides functionality that makes
this technique a general tool for classification and data mining. However,
most - if not all - textbook treatments of discriminant function analysis
are limited to simple and stepwise analyses with single degree of freedom
continuous predictors. No 'experience' (in the literature) exists regarding
issues of robustness and effectiveness of these techniques, when they
are generalized in the manner provided in this very powerful module. The
use of best-subset methods, in particular when used in conjunction with
categorical predictors, should be considered a heuristic search method,
rather than a statistical analysis technique.

**General**

**Model building
method**. Specifies a model building method.

**Detail of computed results reported**. Detail of computed results;
if Minimal level of detail is requested, the output contains Chi-square
tests of roots, discriminant (canonical) function coefficients, factor
structure coefficients, and classification function coefficients. If All
results is requested, Statistica will also report various
descriptive statistics and classification summary statistics. Classification
statistics for each case can be requested separately as an option.

**Construct factorial to degree**. Specifies the factorial degree
of the design to be tested; Statistica will construct an
ANCOVA-like factorial design for all categorical predictors up to the
specified degree (i.e., by default up to degree 1, so that the final model
will include only main effects for categorical predictors; if you set
this parameter to 2, then all two-way interactions will also be included,
and so on).

**Priors**.
Set the prior classification probabilities for classifying observations.
The default specification is Estimated; use this option to set the prior
classification probabilities proportional to the observed group (class)
N's; use the Equal option to assign equal probabilities to each group
or class specified in the categorical dependent variable.

**Case statistics**.
Creates and reports selected case statistics.

**Sweep delta
1.E-**. Specifies the negative exponent for a base-10 constant
Delta (delta = 10^-sdelta); the default value is 7. Delta is used (1)
in sweeping, to detect redundant columns in the design matrix, and (2)
for evaluating the estimability of hypotheses; specifically a value of
2*delta is used for the estimability check.

**Inverse
delta 1.E-**. Specifies the negative exponent for a base-10 constant
Delta (delta = 10^-idelta); the default value is 12. Delta is used to
check for matrix singularity in matrix inversion calculations.

**Generates
data source, if N for input less than**. Generates a data source
for further analyses with other Data Miner nodes if the input data source
has fewer than k observations, as specified in this edit field; note that
parameter k (number of observations) will be evaluated against the number
of observations in the input data source, not the number of valid or selected
observations.

**Parameters
for Stepwise Selection**

**Stepwise
selection criterion**. Specifies the criterion to use for stepwise
selection of predictors. Note that the F statistic is only available for
designs that do not include categorical factor effects.

**p to enter**.
Specifies

**p to remove**.
Specifies

**F to enter**.
Specifies

**F to remove**.
Specifies

**Maximum
number of steps**. Specifies the maximum number of steps for
stepwise selection of variables.

**Parameters
for Best-Subset Selection**

**Best subsets measure**. Specifies the
selection criterion for best subset selection of predictors. To use cross-validation
misclassification rates, a cross-validation variable (learning sample)
must be specified.

**Start for
best subsets**. Specifies the smallest number of predictors to
be included in the model chosen via best subset selection, i.e., the start
of the search for the best subset of predictors.

**Stop for
best subsets**. Specifies the maximum number of predictors to
be included in the model chosen via best subset selection.

**Number of subsets to display**. Specifies the number of subsets
to display in the results; Statistica will keep a log of
the best k predictor models of any given size, using k as specified by
this parameter.

**Number of
variables to force**. Specifies the number of predictors to force
into the model, i.e., to select into all models considered during the
best-subset selection of predictors. *STATISTICA* will force the
first k predictors in the list of continuous predictors into the model,
with k as specified here by you.

**Deployment. **Deployment is
available if the Statistica installation is licensed for this feature.

**Generates
C/C++ code**. Generates C/C++ code for deployment of predictive
model.

**Generates SVB code**. Generates Statistica Visual
Basic code for deployment of predictive model.

**Generates
PMML code**. Generates PMML (Predictive Models Markup Language)
code for deployment of predictive model. This code can be used via the
Rapid Deployment options to efficiently compute predictions for (score)
large data sets.

**Saves C/C++
code**. Save C/C++ code for deployment of predictive model.

**File name
for C/C code**. Specify the name and location of the file where
to save the (C/C++) deployment code information.

**Saves SVB code**. Save Statistica Visual Basic code
for deployment of predictive model.

**File name
for SVB code**. Specify the name and location of the file where
to save the (SVB/VB) deployment code information.

**Saves PMML
code**. Saves PMML (Predictive Models Markup Language) code for
deployment of predictive model. This code can be used via the Rapid Deployment
options to efficiently compute predictions for (score) large data sets.

**File name
for PMML (XML) code**. Specify the name and location of the file
where to save the (PMML/XML) deployment code information.