Cross-Validation
Cross-validation refers to
the process of assessing the predictive accuracy of a model in a test
sample (sometimes also called a cross-validation
sample) relative to its predictive accuracy in the learning sample
from which the model was developed. Ideally, with a large sample size,
a proportion of the cases (perhaps one-half or two-thirds) can be designated
as belonging to the learning sample and the remaining cases can be designated
as belonging to the test sample. The model can be developed using the
cases in the learning sample, and its predictive accuracy can be assessed
using the cases in the test sample. If the model performs as well in the
test sample as in the learning sample, it is said to
cross-validate well, or simply to cross-validate.
For discussions of this type of test sample cross-validation,
see the Computational
Methods section of the Classification
Trees Overviews, the Classification
section of the Discriminant
Analysis Introductory Overview, and Data
Mining.
A variety of techniques have been developed
for performing cross-validation
with small sample sizes by constructing test samples and learning samples
which are partly but not wholly independent. For a discussion of some
of these techniques, see the Computational
Methods section of the Classification
Trees Overviews.