Gini Measure of Node Impurity

In Classification and Regression Trees (e.g., GC&RT, Interactive Trees), the default option for the goodness-of-fit measure in classification problems is Gini; further, options are provided for specifying the prior classification probabilities (or Priors). The selection of prior probabilities can affect the splits that are chosen for the final tree, and greatly affect the accuracy of the final C&RT model for predicting particular classes. What follows is a discussion and explanation of these issues.

Prior Probabilities and the Gini Measure of Node Impurity

According to Breiman, Friedman, Olshen, & Stone (1984), the Gini measure of node impurity at node (which STATISTICA uses by default in GC&RT and, therefore, Boosted Trees) is defined to be (pp. 28 & 38)

where

and

such that

p ( j | t ) is the estimated probability that an observation belongs to group j given that it is in node t,

p ( j , t ) is the estimated probability that an observation is in group j and at node t ,

p ( t ) is the estimated probability that an observation is at node t, ,

is the prior probability for group j,

N j ( t ) is the number of group j members at node t,

and N j is the size of group j.

Therefore, the prior probabilities play a role in every Gini Measure computation at every node. However, Breiman et al. also note that, when the prior probabilities are estimated from the data,

This fact can cause higher misclassification rates in under-represented groups (see Prior Probabilities and Misclassification Costs below).

Prior Probabilities and Misclassification Costs

When non-uniform misclassification costs are specified for the GC&RT analysis, the Gini measure is modified to account for these costs (p. 113):

where C ( i | j ) is the cost of misclassifying an observation in class j as belonging to class i. This feature enables the user to effectively penalize certain types of misclassifications in the analysis. However, as noted above in Prior Probabilities and the Gini Measure of Node Impurity, p ( j | t ) is a function of p ( j ), the prior probability for class j. Therefore, for a given C ( i | j ) and p ( j ), one can find C ' ( i | j ) and p ' ( j ) , such that

Consequently, if C ' ( i | j ) is taken to be unity for all i j and p ' ( j ) can be found, such that the above relationship is satisfied, then this adjustment of the prior probabilities can have the same net effect as the specification of non-uniform misclassification costs.  This property can be readily observed in classification problems where one of the classes is underrepresented in the data.  In this case, for uniform misclassification costs, prior probabilities that are estimated from the sample proportions will produce a model that tends to under-perform with respect to the underrepresented class.  However, if one increases the prior probability for the underrepresented class, then the model will tend to do a better job of classifying cases in this group.