Outliers

Outliers are atypical (by definition), infrequent observations; data points that do not appear to follow the characteristic distribution of the rest of the data. These may reflect genuine properties of the underlying phenomenon (variable), or be due to measurement errors or other anomalies that should not be modeled. Because of the way in which the regression line is determined in Multiple Regression (especially the fact that it is based on minimizing not the sum of simple distances but the sum of squares of distances of data points from the line), outliers have a profound influence on the slope of the regression line and consequently on the value of the correlation coefficient. A single outlier is capable of considerably changing the slope of the regression line and, consequently, the value of the correlation. Typically, we believe that outliers represent a random error that we would like to be able to control. Needless to say, outliers may not only artificially increase the value of a correlation coefficient, but they can also decrease the value of a "legitimate" correlation.

The brushing tool can be used to interactively remove outliers by pointing to them in the graph (to explore their influence on a linear or nonlinear function fitted to the data).

See also, Recode Outliers and Extreme/Rare Values and Exploratory Data Analysis and Data Mining Techniques.