Language Model-Enhanced Feature Engineering Framework for Customer Churn Analysis

Abstract:

Customer churn analysis is a critical challenge in many industries, necessitating innovative approaches to enhance predictive performance. Customer-generated textual data, such as feedback, reviews, or chat logs—contain valuable signals of satisfaction, dissatisfaction, or propensity to churn. However, existing studies often overlook this data’s potential. On the other hand, Language Models (LMs), with their ability to analyse textual data and extract meaningful insights, have the potential to offer promising solutions. This study presents a language model-enhanced feature engineering framework for customer churn analysis that links LMs and domain expertise to generate meaningful features. We introduce an innovative pipeline that generates interaction features, derived from existing data and validated by domain experts. Then, fine-tuned LMs are used for Sentiment Analysis , Emotional Tone Detection, and Topic Modelling to extract meaningful features from text data. These features, combined with expert-guided topic selection and a novel feature, namely ‘Normalised-weighted Churn Score’, incrementally improved classifiers’ performance. We evaluate our pipeline on a publicly available dataset to highlight the effectiveness of our approach. The results highlight the crucial role of textual data, domain expertise, and LMs in enhancing customer churn analysis.