Cross-Validation

Cross-validation refers to the process of assessing the predictive accuracy of a model in a test sample (sometimes also called a cross-validation sample) relative to its predictive accuracy in the learning sample from which the model was developed. Ideally, with a large sample size, a proportion of the cases (perhaps one-half or two-thirds) can be designated as belonging to the learning sample and the remaining cases can be designated as belonging to the test sample. The model can be developed using the cases in the learning sample, and its predictive accuracy can be assessed using the cases in the test sample. If the model performs as well in the test sample as in the learning sample, it is said to cross-validate well, or simply to cross-validate. For discussions of this type of test sample cross-validation, see the Computational Methods section of the Classification Trees Overviews, the Classification section of the Discriminant Analysis Introductory Overview, and Data Mining.

A variety of techniques have been developed for performing cross-validation with small sample sizes by constructing test samples and learning samples which are partly but not wholly independent. For a discussion of some of these techniques, see the Computational Methods section of the Classification Trees Overviews.