Data Reduction

The term Data Reduction is used in two distinctively different meanings:

Data Reduction by decreasing the dimensionality (exploratory multivariate statistics). This interpretation of the term Data Reduction pertains to analytic methods (typically multivariate exploratory techniques such as Factor Analysis, Multidimensional Scaling, Cluster Analysis, Canonical Correlation, or Neural Networks) that involve reducing the dimensionality of a data set by extracting a number of underlying factors, dimensions, clusters, etc., that can account for the variability in the (multidimensional) data set. For example, in poorly designed questionnaires, all responses provided by the participants on a large number of variables (scales, questions, or dimensions) could be explained by a very limited number of "trivial" or artificial factors. For example, two such underlying factors could be: 1) the respondent's attitude toward the study (positive or negative) and 2) the "social desirability" factor (a response bias representing a tendency to respond in a socially desirable manner).

Data Reduction by unbiased decreasing of the sample size (exploratory graphics). This type of Data Reduction is applied in exploratory graphical data analysis of extremely large data sets. The size of the data set can obscure an existing pattern (especially in large line graphs or scatterplots) due to the density of markers or lines. Then, it can be useful to plot only a representative subset of the data (so that the pattern is not hidden by the number of point markers) to reveal the otherwise obscured but still reliable pattern.

See also Data Mining.