Feature Selection Techniques. Churn Prediction in the Telecom Industry. A Quest for Interpretability

Abstract:

Telecommunications is one of the industries in which the customer database plays an important role in maintaining stable incomes by paying special attention to prevent customer migration to other providers. The purpose of this paper is to detect which variables from the multitude of them presented in the data set, represents an important driver in the problem of migrating customers to another Romanian mobile telecommunications company. Our purpose is to identify which characteristics from the predictive model need to be monitored and evaluated or which of their values ​​subject the company to the risk of churn and which are the thresholds that make the difference between migrating or not to the competition. Churn analysis is developed for postpaid clients. We used a Balanced Random Forest for the churn model and three feature selection tools: Permutation Importance, Partial Dependence Plot and SHAP. Applying them to the churn model, we classified the predictive indicators according to their importance, their predictive power and the distribution of the impact that each characteristic has in the model. According to the Permutation Importance, the drivers regarding churn issue are: the number of months since the last offer was changed from the account, the number of minutes consumed outside the company, the value of the invoice, the age of the customer and his time at this telecommunications operator. Partially Dependence Plot determinates the churn risk areas faced by the Romanian telecommunications company for each of the indicators listed, such as: clients with younger ages or with outdated offers (unchanged for almost two years). SHAP also shows that a large number of months since the last offer, a significant percentage of minutes received from competing networks or a small age in the network, increases the estimated churn per customer.

 

nsdlogo2016