Going Concern and Bankruptcy Prediction under Extreme Class Imbalance: Cost-Sensitive Learning, Resampling, and Focal Loss with Explainable Financial-Ratio Portraits
DOI:
https://doi.org/10.69987/JACS.2024.40407Keywords:
going concern, bankruptcy prediction, imbalanced learning, cost-sensitive learning, focal loss, audit sampling, explainable financial ratiosAbstract
Auditors and creditors face a going-concern screening problem: only a small fraction of firms fail, yet failing to identify a distressed firm is substantially costlier than issuing a false alarm. This paper formulates bankruptcy/going-concern prediction as an extreme class-imbalance ranking task and reports a fully reproducible empirical evaluation on the Polish Companies Bankruptcy benchmark distributed by the UCI Machine Learning Repository (five forecasting horizons; 64 financial ratios; 43,405 firm-year records). We compare three imbalance-aware strategy families: (i) cost-sensitive learning via class-weighted objectives, (ii) resampling via random over-sampling, random under-sampling, and SMOTE, and (iii) focal loss to emphasize hard minority examples. Performance is assessed using the area under the precision–recall curve (AUPRC) and Recall@Top-k, where k represents the fraction of firms audited under a constrained sampling budget. Across all five horizons, tree ensembles dominated linear models, and exploiting missingness patterns was critical: augmenting a cost-sensitive random forest with missing-value indicator features increased AUPRC from 0.480 to 0.640 in the most imbalanced 1stYear case. Overall, the cost-sensitive random forest with missing indicators achieved AUPRC values of 0.640, 0.480, 0.459, 0.567, and 0.601 for the 1stYear to 5thYear cases, respectively, under a stratified 70/30 split (seed=42). The audit-oriented metric showed large operational gains: in 1stYear (3.86% bankrupt), auditing only the top 5% of firms ranked by the best model recovered 62.96% of bankruptcies (Recall@Top-5%), compared with 12.35% for standard logistic regression. Finally, we provide an explainable financial-ratio portrait that summarizes the characteristic liquidity, leverage, and profitability patterns of the model-flagged high-risk cohort, bridging predictive ranking with actionable accounting evidence for going-concern planning.







