ITrees Extended Options - Advanced Tab

The options available on the Advanced tab of the ITrees Extended Options dialog box are specific to the Model building method selected on the Interactive Trees Startup Panel - Quick tab, i.e., they depend on whether the current analysis is a Classification and Regression Trees (C&RT) analysis or (Exhaustive) CHAID analysis.

Options for C&RT

Number of surrogates. By choosing "similar" predictors (surrogates) with valid data, cases (observations) with missing data can be classified so that such cases can be included in the analysis. In fact, cases with missing values in the response are treated as "prediction samples" and cases with missing values in the predictor as "surrogate samples." The entry in the Number of surrogates box controls the number of surrogates that can be chosen by the analysis during the tree-building process. By default, the number of surrogates is 0 (zero), and missing data values are excluded from the analysis.

In general, at every step during the tree building process, STATISTICA will identify a variable for the next split to improve the accuracy of prediction. If for a particular observation (case) the value for the chosen variable is missing, then the program will look to the next-best variable to split on, to act as a "surrogate" for the best variable. If the value for that variable is missing as well, then the program will look to the third-best split variable, etc. The Number of surrogates option determines how far down the list of predictors (sorted by the degree of improvement in the accuracy of prediction provided by each respective split candidate) the program will go when attempting to find a surrogate for a variable that has missing data for a particular case.

Collect sensitivity analysis data. When this check box is selected, the Pred. stats & details spreadsheet can be created from the ITrees C&RT Results dialog box - Manager tab. If there are continuous variables selected, the Sensitivity graph and the Sensitivity by rank graph can also be created from the ITrees C&RT Results dialog box - Manager tab. If this check box is cleared, the Pred. stats & details, Sensitivity, and Sensitivity by rank buttons will be dimmed.

Options for CHAID and Exhaustive CHAID

Splitting merged categories. When a predictor is used in the splitting criterion, the CHAID (or Exhaustive CHAID) analysis merges the current categories into as small a number of categories as possible in order to find a parsimonious split rule for the tree. If this check box is selected, the merged categories are split to optimally select the categories. See also Basic Tree-Building Algorithm: CHAID and Exhaustive CHAID for details.

Bonferroni adjustment. As described in Basic Tree-Building Algorithm: CHAID and Exhaustive CHAID, at the point of selecting the best predictor for a split, the program finds the predictor with the smallest p-value (greatest statistical significance) for the set of categories for the respective predictor. This p-value can be computed after applying the Bonferroni adjustment.

Intervals. The Interactive Trees (C&RT, CHAID) module gives you full control over the manner in which the range of values in continuous predictors in CHAID and Exhaustive CHAID analyses is divided into intervals (unlike the General CHAID Models module, which applies automatic algorithms for building trees; see Basic Tree-Building Algorithm: CHAID and Exhaustive CHAID for details; see also the Introductory Overview topic - Differences in Computational Procedures section). Use the options in the Intervals group box to determine how exactly the range of values in each continuous predictor variable is to be divided into intervals.

Automatic continuous predictor intervals in each node. Select this check box if you want STATISTICA to recompute an optimal set of (approximately equal-N) intervals for each continuous predictor in the analysis (to evaluate potential splits) at each node. If this check box is not selected, the program will only determine an optimal number of intervals once, when the data is read for the first time; thereafter, these intervals remain unchanged unless manually modified by the user. Note that this (latter) procedure is more efficient since it requires fewer reading passes through all data and, hence, it may result in faster computations (of the final tree). However, there is the possibility that the final (terminal) nodes in the analysis are determined from very few (sparse) intervals for the continuous predictors, leading to less than optimal splits.

Continuous predictor intervals. This option is available only if the Automatic continuous predictor intervals in each node check box is not selected. In this case, you can modify manually the intervals for each continuous predictor that are to be used for evaluating potential splits (for the CHAID tree). Select the desired continuous predictor variable from the drop-down list, and then click the Intervals button to display the Specify Boundaries dialog box, where you can modify the default intervals created for the respective (selected) continuous predictor.