Modelling Recovery Rates in Mass Debt Portfolios Using Machine Learning Algorithms

Abstract:

This study examines the effectiveness of three machine-learning algorithms, decision tree, random forest, and XGBoost in predicting recovery rates within large-scale receivables portfolios. Using real operational data from a major debt-collection organisation in Poland (389,250 cases), the models were trained exclusively on ex-ante features, making them suitable for early-stage valuation and strategic decision-making. Extensive hyperparameter tuning and evaluation using MAE, RMSE, and R² indicate that random forest provides the best balance of accuracy and stability, while XGBoost shows strong fit during training but limited improvement on unseen data. SHAP-based explainability highlights the dominant role of financial attributes claim amount, debtor age, and regional characteristics. Stable models, particularly random forest, can enhance financial forecasting, strengthen investment decisions in securitised portfolios, and increase overall operational efficiency. The study underscores the potential of ML-driven decision-support tools in improving strategic and economic outcomes in the debt-collection industry.