Correlations Introductory Overview - Casewise vs. Pairwise Deletion of Missing Data

The default way of deleting missing data while calculating a correlation matrix is to exclude all cases that have missing data in at least one of the selected variables; that is, by casewise deletion of missing data. Only this way will you get a "true" correlation matrix, where all correlations are obtained from the same set of observations. However, if missing data are randomly distributed across cases, you could easily end up with no "valid" cases in the data set, because each of them will have at least one missing data in some variable. The most common solution used in such instances is to use so-called pairwise deletion of missing data in correlation matrices, where a correlation between each pair of variables is calculated from all cases that have valid data on those two variables. In many instances there is nothing wrong with that method, especially when the total percentage of missing data is low, say 10%, and they are relatively randomly distributed between cases and variables. However, it may sometimes lead to serious problems.

For example, a systematic bias may result from a "hidden" systematic distribution of missing data, causing different correlation coefficients in the same correlation matrix to be based on different subsets of subjects. In addition to the possibly biased conclusions that you could derive from such "pairwise calculated" correlation matrices, real problems may occur when you subject such matrices to another analysis (e.g., multiple regression, factor analysis, or cluster analysis) that expects a "true correlation matrix," with a certain level of consistency and "transitivity" between different coefficients. Such a correlation matrix may turn out to be not a "true" correlation matrix, and the other program will either be unable to process it, or will give erroneous results. (Note that in STATISTICA you can either save a matrix in Basic Statistics and Tables and access it with another program or calculate a matrix casewise or pairwise in the respective program.) Thus, if you are using the pairwise method of deleting the missing data, be sure to examine the distribution of missing data across the cells of the matrix for possible systematic "patterns."