Weight of Evidence (WoE) and Information Value (IV) — how to use it in EDA and Model Building?
Weight of Evidence (WoE) and Information Value (IV) can be used to understand the predictive power of an independent variable. WoE helps to understand if a particular class of an independent variable has a higher distribution of good or bad.
How to calculate WOE and IV?
If we consider a binary classification where our target variable is bad (0) / good (1) loans, and we have a feature called Age Group having 3 classes (18–35, 36–55, 56–70), we can calculate WoE for each class using below equation:
In the example below, there is a total of 7302 Good (1) observations, out of which, 2000 are in 18–35 age group. Again, out of a total 693 (0) bad observations, 400 observations belong to class 18–35.
WoE for 18–35 class=
Negative WoE value of 18–35 denotes that the distribution of bad > distribution of goods for age group 18–35.
Now, Information Value can be calculated by using below equation:
IV= WoE * (proportion of all good in the class — proportion of all bad in the class)
So, for 18–35 class, IV= -0.75 * (.27 — .58) = .23
The overall Information value of Age Group is = (.23+ .08 + .09) = .40
Where to use WoE?
A. Performing Binning of features using WoE Analysis:
During EDA, we often perform binning of categorical and continuous variables. Assume, in a dataset, we have Age as a continuous variable and Good/Bad Loan as categorical target variable. We might be interested to find what could be the logical separations to create bins for different age groups. First, we create a higher number of age groups and then calculate WOE for each group. If there is a monotonic trend of WOE values (either descending or ascending), then we can confirm that our bins have a general trend. If it’s not monotonic, then we need to compress the bins to form new groups and need to check the WOE values again. Thus, using WOE, we can create logical bins for further data analysis.
In our first attempt, we created 5 bins for continuous variable ‘Age’. But no monotonic trend can be seen here. So, in the next attempt, we compressed two groups and created 3 bins, as shown below. Now we can see a monotonic trend (decrease in WOE value) for different classes of ‘Age’.
We can follow the same approach to group ordinal and nominal categorical variables having higher number of classes into lesser number of bins.
B. Using IV for Feature Selection for Binary Logistic Regression:
We can calculate overall information value of an independent variable and then if the value is between .30 and .50 we can consider the variable as a strong predictor. Any variable having IV lesser than .02 can be excluded in our binary logistic regression model. Below table is showing the relation between IV and Predictive power of a variable.
In our previous example ‘Age’ has IV = .40, that makes it a strong predictor to predict our target variable, if a loan is bad/good.
C. Using WoE value to transform Continuous Variable into a discrete variable:
In the above example we have converted continuous variable ‘Age’ into a categorical variable for performing EDA. We can further replace those categories with their WOE values and then discrete variable ‘Age’ can be used in our logistic regression model for binary prediction. It helps the model to become more stable, and a small change in ‘Age’ will not impact the model. But it comes for a price.
You can notice an interesting fact here. After reducing the number of bins from 5 to 3, overall Information Value (IV) of ‘Age’ variable also reduced a bit. Whenever we transform a continuous variable into a categorical variable or reduce the number of classes of a categorical variable using binning, we also lose some predictive power of that variable.
If the overall IV of a variable is <= .1 then we consider it as a weak predictor and if IV is in between of .3 and .5 then it’s considered as a very good predictor.
D. Using WoE value to transform Categorical Variables for model building:
During Binary Classification using Logistic Regression, it’s required to transform any categorical variables into numeric variables. One Hot Encoding and Label Encoding both have some dis-advantages. I am not going into too many details, but one disadvantage you may think of, when a nominal variable has too many classes and performing One Hot Encoding may lead to the Curse of Dimensionality.
WOE score can be used to transform the categorical variable into a numeric variable. Calculated WOE values of different classes can be used to replace their actual values. Hence there is no requirement of performing any separate encoding for model building.
Missing values can also be grouped (say, ‘Unknown’ class) and can be imputed with the computed WoE value.
WoE is a simple yet powerful method and can be very helpful in Exploratory / Confirmatory data analysis and binary logistic regression model building. You need to remember that WoE and Information Value are mainly for binary logistic regression model. If you are planning to use any other classification algorithm like SVM, Decision Tree or Random Forest, then it’s not going to give you the optimal result. One reason you can think of, Decision Tree or Random Forest can identify a non-linear relation between independent and target variables. Selecting features based on IV may result in excluding a relevant feature from the model.