Abstract:
We investigate firm failure using XGBoost, a machine learning algorithm, for a large set of Romanian companies consisting of approximately 360.000 observations. Our predictors are financial ratios calculated from the 2021 financial statements, while the target variable, firm failure (defined as the transition to negative equity) corresponds to the year 2022. The Romanian economy exhibits considerable regional differences both in concentration and in sectoral composition. We investigate whether regional information improves the firm failure model’s predictive performance and at what granularity. We also investigate whether separate regional models outperform a single, pooled model. We show that aggregate regional information improves model performance, while more granular county-level information does not offer additional predictive gains. We show that single pooled models outperform separate regional models and that they are able to capture regional differences, suggesting that part of the information contained in the regional variable may be embedded in other explanatory variables as well. We use Permutation Feature Importance to study how pooled and regional models assign importance to variable classes and find similar patterns across regions. We also find that regional models transfer reasonably well to other regions, albeit with a small sacrifice in performance. Taken together, all these findings suggest that there are common mechanisms of firm failure across regions that are best captured by pooled models, with access to a larger data set and that the potential specialization advantage of separate models does not outweigh the availability of a larger set of training data.
