Lift Chart

The lift chart provides a visual summary of the usefulness of the information provided by one or more statistical models for predicting a binomial (categorical) outcome variable (dependent variable); for multinomial (multiple-category) outcome variables, lift charts can be computed for each category. Specifically, the chart summarizes the utility that we may expect by using the respective predictive models, as compared to using baseline information only.

The lift chart is applicable to most statistical methods that compute predictions (predicted classifications) for binomial or multinomial responses. In STATISTICA, lift charts can be computed in various modules, including General Classification and Regression Trees (GC&RT), GCHAID, Generalized Linear/Nonlinear Models (Logit and Probit models for binomial responses), General Discriminant Analysis (GDA) (for binomial responses), etc. The Rapid Deployment of Predictive Models module will compute simple and overlaid lift charts (for multiple predictive models) based on models trained and deployed via PMML. This and similar summary charts (see Gains Chart) are commonly used in Data Mining projects, when the dependent or outcome variable of interest is binomial or multinomial in nature.

Example. To illustrate how the lift chart is constructed, consider this example. Suppose you have a mailing list of previous customers of your business, and you want to offer to those customers an additional service by mailing an elaborate brochure and other materials describing the service. During previous similar mail-out campaigns, you collected useful information about your customers (e.g., demographic information, previous purchasing patterns) that you could relate to the response rate, i.e., whether the respective customers responded to your mail solicitation. Also, from similar prior mail-out campaigns, you were able to estimate the baseline response rate at approximately 7 percent, i.e., 7% of all customers who received a similar offer by mail responded (purchased the additional service).

Given this baseline response rate (7%) and the cost of the mail-out, sending the offer to all customers would result in a net loss. Hence, you want to use statistical analyses to help you identify the customers who are most likely to respond. Suppose you use STATISTICA Classification and Regression Trees (C&RT) to build such a model based on the data collected in the previous mail-out campaign. You can now select only the 10 percent of the customers from the mailing lists who, according to prediction from the C&RT model, are most likely to respond. If among those customers (selected by the model) the response rate is 14% percent (as opposed to the 7% baseline rate), then the relative gain or lift value due to using the predictive model can be computed as 14% / 7% = 2. In other words, by using STATISTICA C&RT to select 10% of customers from the mailing list, you were able to do twice as well as you would have done using simple random selection.

Analogous lift values can be computed for each percentile of the population (customers on the mailing list). You could compute separate lift values for selecting the top 20% of customers who are predicted to be among likely responders to the mail campaign, the top 30%, etc. Hence, the lift values for different percentiles can be connected by a line that will typically descend slowly and merge with the baseline if all customers (100%) would be selected.

If more than one predictive model is used, multiple lift charts can be overlaid (as shown in the illustration above) to provide a graphical summary of the utility of different models.

See also Rapid Deployment of Predictive Models and Gains Chart.