Correlations Introductory Overview - How to Identify Biases Caused by the Bias due to Pairwise Deletion of Missing Data

The Summary: Correlation matrix button (see the Advanced/Plot tab of the Product-Moment and Partial Correlations dialog) produces spreadsheets with n's, means and standard deviations calculated separately for "each variable with each variable," that is, based on subsets of values included in the calculation of individual correlation coefficients. If the pairwise deletion of missing data does not introduce any systematic bias to the correlation matrix, then all those pairwise descriptive statistics for one variable should be very similar. However, if they differ, then there are good reasons to suspect a bias. For example, if the mean (or standard deviation) of the values of variable A that were taken into account in calculating its correlation with variable B is much lower than the mean (or standard deviation) of those values of variable A that were used in calculating its correlation with variable C, then we would have good reason to suspect that those two correlations (A-B and A-C) are based on different subsets of data, and thus, that there is a bias in the correlation matrix caused by a non-random distribution of missing data.