Computational Methods - Determining When to Stop Splitting

The third step in classification tree analysis is to determine when to stop splitting. One characteristic of classification trees is that if no limit is placed on the number of splits that are performed, eventually "pure" classification will be achieved, with each terminal node containing only one class of cases or objects. However, "pure" classification is usually unrealistic. Even a simple classification tree such as a coin sorter can produce impure classifications for coins whose sizes are distorted or if wear changes the lengths of the slots cut in the track. This potentially could be remedied by further sorting of the coins that fall into each slot, but to be practical, at some point the sorting would have to stop and you would have to accept that the coins have been reasonably well sorted.

Likewise, if the observed classifications on the dependent variable or the levels on the predicted variable in a classification tree analysis are measured with error or contain "noise," it is unrealistic to continue to sort until every terminal node is "pure." In the Classification Trees module, there are two options available for controlling when splitting stops. These two options are linked to the choice of the stopping rule specified for the analysis.

Minimum n. One option for controlling when splitting stops is to allow splitting to continue until all terminal nodes are pure or contain no more than a specified minimum number of cases or objects. This option is available when either Prune on misclassification error or Prune on deviance has been selected as the Stopping rule for the analysis. The desired minimum number of cases can be specified as the Minimum n, and splitting will stop when all terminal nodes containing more than one class have no more than the specified number of cases or objects.

Fraction of objects. The other option for controlling when splitting stops is to allow splitting to continue until all terminal nodes are pure or contain no more cases than a specified minimum fraction of the sizes of one or more classes. This option is available when FACT-style direct stopping has been selected as the Stopping rule for the analysis (FACT is a classification tree program developed by Loh and Vanichestakul, 1988, that is a precursor of QUEST). The desired minimum fraction can be specified as the fraction of objects and, if the priors used in the analysis are equal and class sizes are equal, splitting will stop when all terminal nodes containing more than one class have no more cases than the specified fraction of the class sizes for one or more classes. If the priors used in the analysis are not equal, splitting will stop when all terminal nodes containing more than one class have no more cases than the specified fraction for one or more classes.