C&RT Quick Specs - Classification Tab

The Classification tab of the C&RT Quick specs dialog box contains three group boxes: Misclassification costs, Goodness of fit, and Prior probabilities. Use the options in these boxes to choose the misclassification cost (Equal or User spec.), the goodness of fit measure (Gini measure, Chi-square, or G-square), and the prior probabilities (Estimated, Equal, or User specified) for the General Classification/Regression Trees analysis. Note that these options are available only if the dependent (criterion) variable in the current analysis is categorical, and if the Categorical response check box on the Quick tab is selected, i.e., if the goal of the current analysis is to correctly classify cases (observations) into the groups specified in the dependent variable.

Misclassification costs. This option is used to assign greater importance to the accurate prediction (classification) for some classes as compared to others. For example, in medical research you may want assign greater importance to the accurate classification of malignant tumors as compared to accurate discrimination between different types of benign forms. In that case, you would assign greater costs to the misclassification of malignant tumors, and lower costs to the misclassification of benign tumors. Note also that, as illustrated in this example, the matrix of misclassification costs does not necessarily have to be symmetric, and in fact it rarely is (i.e., it is more costly to misclassify malignant tumors as benign, then the other way around).

Equal. If you select the Equal option button, each off-diagonal element of the predicted class (row) by observed class (column) misclassification costs matrix is set equal to 1.0, and the specified prior probabilities for the classes on the dependent variable are not adjusted.

User spec. Select the User specified option button if more accurate classification is desired for some classes than others. Note that this option is available only if you have checked the Categorical response check box and have selected the codes by clicking the Response codes button on the Quick tab. When you click this button, the User-Specified Misclassification Costs dialog box is displayed.

Goodness of fit. These options pertain to the goodness of fit measure that is used as a criterion for selecting the best split from the set of possible candidate splits. You can choose one of the three measures: Gini measure, Chi-square, and G-square to be used as goodness of fit measure. See also, Computational Details and Breiman et al.(1984) for details concerning these measures.

Gini measure. The Gini measure (see also, Computational Formulas) is a measure of impurity of a node and can be used as a measure of goodness of fit to compute the "right-sized" tree. With priors estimated from class sizes and equal misclassification costs, the Gini measure is computed as the sum of products of all pairs of class proportions for classes present at the node. This measure reaches its maximum value when class sizes at the node are equal, and reaches a value of zero when only one class is present at a node (and hence, when the classification for the observed data is perfect). The Gini measure is the commonly preferred measure of goodness (e.g., Breiman et. al.,1984).

Chi-square. The Chi-square option is similar to the standard Chi-square value computed for the expected and observed classifications (with priors adjusted for misclassification cost).

G-square. The G-square option is similar to the  maximum-likelihood Chi-square (as, for example, computed in the Log-Linear module).

Prior probabilities. This option is used to specify how likely it is, without using any prior knowledge of the values for the predictor variables in the model, that a case or object will fall into one of the classes. The Prior probabilities group box contains three options for this purpose: Estimated, Equal, and User specified. Note that the User specified option button is available only after you have selected the specific Response codes for the dependent variable on the Quick tab of the specification dialog.

Estimated. Select the Estimated option button to specify that the likelihood that a case or object will fall into one of the classes is proportional to the dependent variable class sizes. See also the respective options for the Classification Trees Analysis module for additional details.

Equal. Select the Equal option button to specify that the likelihood that a case or object will fall into one of the classes is the same for all dependent variable classes. See also the respective options for the Classification Trees Analysis module for more details.

User specified. Select the User specified option button if you have specific knowledge about the base rates (for example, based on previous research). When you select this option button, the Enter values for the prior probabilities dialog box will be displayed, in which you can specify the a priori probabilities for each class of the dependent variable. This dialog is automatically displayed only the first time that priors are set to user defined (i.e., the User specified option button is selected); thereafter, click the accompanying button to display the dialog containing the previously specified values. If the probabilities do not add up to 1.0, STATISTICA will automatically adjust them proportionately.