Predict Customer Churn with Gradient Boosting
Customer churn is a key predictor of the long term success or failure of a business. But when it comes to all this data, what's the best model to use? This post shows that gradient boosting is the most accurate way of predicting customer attrition. I'll show you how you can create your own data analysis using gradient boosting to identify and save those at risk customers!
Why predict customer churn?
Customer retention should be a top priority of any business as acquiring new customers is often far more expensive that keeping existing ones. It is no longer a given that long standing customers will remain loyal given the numerous options in the market. Therefore, it is vital that companies can proactively determine the customers most at risk of leaving and take preventative measures against this.
Predictive models for customer churn can show the overall rate of attrition, while knowledge of how the churn rate varies over a period of time, customer cohort, product lines and other changes can provide numerous valuable insights. Yet, customers also vary enormously in their behaviors and preferences which means that applying a simple "rule of thumb" analysis will not work. Here's where a predictive model using gradient boosting can help you.
First, I'm going to describe the data. Then I'll use gradient boosting to predict who will churn and who will stay. Finally I'll benchmark my result against other models.
The data
I'll aim to predict Churn, a binary variable indicating whether a customer of a telecoms company left in the last month or not.
To do this I'll use 19 variables including:
- Length of tenure in months.
- Types of services signed up for such as phone, internet and movie streaming.
- Demographic information.
- Monthly charges, type of contract and billing.
The full data set is available here.
The breakdown of Churn is shown below. If we predict No (a customer will not churn) for every case, we can establish a baseline. Our baseline establishes that 73% is the minimum accuracy that we should improve on.
Gradient boosting
In this earlier post I explained gradient boosting. Gradient boosting sits alongside regression, decision trees, support vector machines and random forests. They are all supervised learning algorithms capable of fitting a model to train data and make predictions.
A common strategy when working with any of these models is to split the data into a training sample and a testing sample. The model learns the associations between the predictor variables and the target outcome from the training sample. The testing sample is used to provide an unbiased estimate of the prediction accuracy on unseen data.
I randomly split the data into a a 70% training sample and a 30% testing sample. I then perform gradient boosting with an underlying tree model. The chart below shows the 10 most important variables. We learn that having a monthly contract, length of tenure and amount of charges are useful predictors of churn.
How accurately can we predict customer churn with gradient boosting?
Gradient boosting has various internal parameters know generically as hyper-parameters. These settings determine the size of the underlying trees and the impact that each round of boosting has on the overall model. It can be time consuming to explore all of the possibilities to find the best values. To create the model below I automatically performed a grid search of 36 different combinations of hyper-parameters. I selected the best set by 5-fold cross validation.
We've already established that our baseline when always predicting that a customer will not churn is 73%. This amounts to a shot in the dark when trying to determine whether or not a customer will churn. Not great, right? However, when we can input more information like different variables such as a person's bills, their contract length or tenure and their comfort level with the technology, we can learn much more about this customer.
From this information, we can more accurately pinpoint who will churn and our prediction accuracy rises by 8% to 80.87%. This gives us a much greater edge in being able to identify the factors that may lead to customer attrition and more of an edge when it comes to the crucial business of customer retention!
Why choose gradient boosting over other models?
In the same way that I just fitted a gradient boosting model, we can fit other models. I tried 3 other approaches. Each time I followed the same procedure as above, selecting the same variables, fitting with the training sample and calculating accuracy from the testing sample. The results are:
Model | Accuracy |
---|---|
Gradient Boosted Tree | 80.87% |
CART | 79.21% |
Random Forest | 79.94% |
Linear Discriminant Analysis | 79.97% |
Whilst this is not a comprehensive comparison, gradient boosting performs the best amongst these models with the highest accuracy score.
TRY IT OUT
The analysis in this post was performed in Displayr using R. The flipMultivariates package, which uses the xgboost package, performs the machine learning calculations. You can try this analysis for yourself in Displayr.