Weight of Evidence

The Weight of Evidence or WoE value is a widely used measure of the “strength” of a grouping for separating good and bad risk (default). It is computed from the basic odds ratio:

(Distribution of Good Credit Outcomes) / (Distribution of Bad Credit Outcomes)

or the ratios of Distr Goods/ Distr Bads for short, where Distr refers to the proportion of Goods or Bads in the respective group, relative to the column totals, i.e., expressed as relative proportions of the total number of Goods and Bads.

Specifically, Weight of Evidence value for a group consisting of n observations is computed as:

The value of WoE will be 0 if the odds of Goods/Bads is equal to 1.0. If the distribution (percentage) of Bads in a group is greater than the distribution of Goods, the odds ratio will be less than 1.0, and the WoE will be a negative number; if the distribution of Goods is greater than the distribution of Bads in a group, the WoE value will be a positive number.

The Automated WoE Coding module will identify the best WoE coding solutions for continuous and categorical (discrete) predictors, and also compute constrained solutions with simple WoE functions over the value range of the respective predictors.

WoE and Logistic Regression. The WoE recoding of predictors is particularly well suited for subsequent modeling using Logistic Regression. Specifically, logistic regression will fit a linear regression equation of predictors (or WoE-coded continuous predictors) to predict the logit-transformed binary (Goods/Bads) dependent or Y variable. The Logit transformation is simply the log of the odds, i.e., ln(p(Goods)/p(Bads)). Therefore, by using WoE-coded predictors in logistic regression, the predictors are prepared and coded to the same (WoE) scale, and the parameters in the linear logistic regression equation can be immediately compared, for example, when using the modeling tools for Stepwise Model Builder.

Information Value (IV)

The Information Value (IV) of a predictor is related to the sum of the (absolute) values of the WoE values over all groups. Thus, it expresses the amount of diagnostic information of a predictor variable for separating the Goods from the Bads. Specifically, given a predictor with n groups, each with a certain Distribution of Goods and Bads, the Information Value (IV) for that predictor can be computed as:

According to Sidiqqi (2006), by convention, the values of the IV statistic can be interpreted as follows. If the IV statistic is:

• Less than 0.02, the predictor is not useful for modeling (separating the Goods from the Bads)

• 0.02 to 0.1, the predictor has only a week relationship to the Goods/Bads odds ratio

• 0.1 to 0.3, the predictor has a medium strength relationship to the Goods/Bads odds ratio

• 0.3 or higher, the predictor has a strong relationship to the Goods/Bads odds ratio.

Example. This example illustrates the computations of the WoE statistic for different coded value ranges for a predictor variable Age, and the resultant Information Value for this predictor.

For age group 21-24, there are 82 Goods and 52 Bads, or 0.117 and 0.173 Goods and Bads respectively, when expressed as proportions of the total number of Goods and Bads. The WoE value for that group is ln(0.11714/0.17333)*100=-39.18; likewise, the respective Contribution of that group to the overall Information Value (IV) is 0.022.