Comparing Approaches for Combining Data sampling and Stacked Autoencoder to address Bankruptcy Prediction

Abstract:

In bankruptcy prediction task, imbalanced data problem has received considerable research attention in recent years. To this objective, several machine learning methods were proposed to build an accurate bankruptcy prediction model. In this paper, we propose a powerful model to handle the imbalanced bankruptcy dataset by combining the oversampling techniques with a deep learning approach based on stacked autoencoder (SAE) and softmax classifier. In order to improve the significance of the minority class in the decision region, three sampling methods such as SMOTE, Safe level SMOTE, Borderline SMOTE are used in the first level to balance the training dataset. At the second level, the SAE is carried out to further enhance the classification accuracy of the model by reducing the dimensionality and extract the most important features for future classification tasks. Based on the optimal feature, the softmax classifier is used to distinguish bankrupt firms from non-bankrupt firms. To assess the classification performance of the proposed classifiers, the area under the curve (AUC), G-mean and F1-score indices are used as performance measures, furthermore, the proposed classifiers are compared with the existing similar works based on the real highly imbalance polish datasets collected from the UCI database. The experimental result suggests that our proposed hybrid technique of Borderline SMOTE+SAE based on the softmax classifier significantly improves the performance of bankruptcy prediction and achieves the best results among all algorithm combinations.