Pairwise Deletion of Missing Data vs. Mean Substitution

In order to avoid losing data due to casewise deletion of missing data, you can use one of two other methods. These are 1) the so-called mean substitution of missing data (replacing all missing data in a variable by the mean of that variable) and 2) pairwise deletion of missing data. These methods of handling missing data can be requested in many modules; you can also use the mean substitution method to permanently "remove" missing data from your data set (via the Replace missing data option on the Data menu). Mean substitution offers some advantages and some disadvantages as compared to pairwise deletion. Its main advantage is that it produces "internally consistent" sets of results ("true" correlation matrices). The main disadvantages are:

  1. Mean substitution artificially decreases the variation of scores, and this decrease in individual variables is proportional to the number of missing data (i.e., the more missing data, the more "perfectly average scores" will be artificially added to the data set).

  2. Because it substitutes missing data with artificially created "average" data points, mean substitution may considerably change the values of correlations.