Correlations Introductory Overview - Correlations in Non-homogeneous Groups

A lack of homogeneity in the sample from which a correlation was calculated can be another factor that biases the value of the correlation. Imagine a case where a correlation coefficient is calculated from data points which came from two different experimental groups but this fact is ignored when the correlation is calculated. Let us assume that the experimental manipulation in one of the groups increased the values of both correlated variables and thus the data from each group form a distinctive "cloud" in the scatterplot (as shown in the graph below).

In such cases, a high correlation may result that is entirely due to the arrangement of the two groups, but which does not represent the "true" relation between the two variables, which may practically be equal to 0 (as could be seen if we looked at each group separately , see the following graph).

If you suspect the influence of such a phenomenon on your correlations and know how to identify such "subsets" of data, try to run the correlations separately in each subset of observations. For example, you could use the Breakdowns option or the Categorized Scatterplots option. If you do not know how to identify the hypothetical subsets, try to examine the data with some of the exploratory multivariate techniques offered in STATISTICA (e.g., Cluster Analysis). See also, Exploratory Data Analysis and Data Mining Techniques.