Classification
Trees - Notes on Computational Algorithms
Discriminant-based
splits. The algorithms used to perform discriminant-based
splits are adapted from QUEST
(Quick, Unbiased, Efficient Statistical Trees). QUEST is a classification
tree program developed by Loh and Shih (1997) that employs a modification
of recursive quadratic discriminant analysis and includes a number of
innovative features for improving the reliability and efficiency of the
classification trees that it computes.
For univariate splits, QUEST uses statistical tests to select the variables
to perform splits. To find the splits, the 2-means clustering algorithm
of Hartigan and Wong (1979, see also Cluster Analysis),
singular value decomposition methods (Schneider & Barker, 1973), and
recursive quadratic discriminant analysis techniques (McLachlan, 1992)
are used. Similar algorithms are used for linear combination splits. Loh
and Shih (1997) provide detailed descriptions of the algorithms used in
QUEST.
C&RT-style
search for univariate splits. The C&RT-style
univariate split selection method is an adaption of the algorithms
used in C&RT, as described by Breiman et al.
(1984). C&RT (Classification And Regression Trees) is a classification
tree program that uses an exhaustive grid search of all possible univariate
splits to find the splits for a classification tree. For categorical predictor
variables with k levels present at a node, there are 2(k
-1) - 1 possible contrasts between two sets of levels of the predictor.
For ordered predictors with k distinct levels present at a node, there
are k -1 midpoints between distinct levels. Thus it can be seen that the
number of possible splits that must be examined can become very large
when there are large numbers of predictors with many levels which must
be examined at many nodes. The C&RT-style exhaustive search for univariate
splits option works by searching for the split that maximizes the reduction
in the value of the selected goodness of fit measure. C&RT split selections
methods are thoroughly documented in Breiman et. al. (1984).
Direct
stopping. The FACT-style
direct stopping
rule option employs techniques used in FACT. FACT is a classification
tree program developed by Loh and Vanichestakul (1988) that is a precursor
of QUEST. The use
of direct stopping as a stopping rule for classification tree is discussed
in Loh and Vanichestakul (1988).
Minimal cost-complexity cross-validation pruning.
The pruning
stopping rule options employ techniques used in C&RT, as described
in Breiman et al. (1984).