Classification Trees Startup Panel - Sampling Options Tab

Select the Sampling options tab of the Classification Trees Startup Panel to access the options described here.

Sampling parameters. There are three boxes under Sampling parameters: Seed for random number generator, V-fold cross-validation; v-value, and p-value for split variable selection. Enter values in these boxes to control the sampling that Statistica performs to obtain cross-validation error estimates.

Seed for random number generator. The positive integer value entered in the Seed for random number generator box is used as a seed for a random number generator that produces V-fold random subsamples from the learning sample to test the predictive accuracy of the computed classification trees.

V-fold cross-validation; v-value. The value entered in the V-fold cross-validation; v-value box determines the number of cross-validation samples that are generated from the learning sample to provide an estimate of the CV cost for each classification tree in the tree sequence. For example, if the default 3-fold cross-validation is performed, three random samples are generated from the learning sample. The cases in two of the three samples in turn are used in the computations for the classification trees, while the cases in the remaining sample are used to provide a test of the predictive accuracy of the computed classification trees.

p-value for split variable selection. The value entered in the p-value for split variable selection box is used in the split variable selection process when a Discriminant-based split selection method has been selected. The p-value is used to determine whether the significance of Levene's F (a statistical test that is robust to violations of the distributional assumptions for ANOVA) or the significance of a standard univariate F is used as the criterion for split variable selection.