LLM-Explanation-Enhanced Retail Credit Default Prediction with Gradient Boosting on the UCI Default of Credit Card Clients Dataset

Hailin Zhou; Sarah Zhao

doi:10.69987/JACS.2024.40508

Authors

Hailin Zhou Applied Analytics, Columbia University, NY, USA Author
Sarah Zhao Computer Science, USC, CA, USA Author

DOI:

https://doi.org/10.69987/JACS.2024.40508

Keywords:

credit default prediction, retail credit risk, gradient boosting, XGBoost, LightGBM, table-to-text, LLM-style explanation

Abstract

Retail credit scoring still depends on strong tabular learners, but operational use also requires explanations that describe behavior rather than only output a probability. This study evaluated an LLM-explanation-enhanced gradient boosting pipeline on the UCI Default of Credit Card Clients dataset. The benchmark contains 30,000 clients, 23 predictive variables, and no missing values [1], [2]. We transformed each client’s six-month bill and repayment history into a deterministic natural-language risk behavior summary that mimics an analyst-style LLM note while remaining fully reproducible. The summary encoded delinquency sequence, utilization, repayment coverage, bill trend, and explicit risk or protective tags. Structured features were modeled with XGBoost and LightGBM, summaries were modeled with TF-IDF logistic regression, and both branches were fused by fixed weighted late fusion. All numbers reported in the manuscript were empirically measured by executing the supplied code; no illustrative placeholders were retained. On a representative 70/15/15 split, XGBoost with engineered structured features reached AUROC 0.7874 and AUPRC 0.5758, while XGBoost weighted fusion reached AUROC 0.7867, AUPRC 0.5731, F1 0.5524, and accuracy 0.8044. Across five repeated stratified splits, XGBoost weighted fusion achieved the best mean AUROC (0.7943 ± 0.0111) and mean AUPRC (0.5706 ± 0.0181), and LightGBM weighted fusion reached 0.7911 ± 0.0102 AUROC and 0.5686 ± 0.0185 AUPRC. The text-only summary model remained competitive at 0.7869 ± 0.0102 AUROC, showing that the generated explanations preserved most of the discriminatory information. Ablation results showed that repayment-sequence narration, finance descriptors, and explicit risk tags all added measurable value. The findings demonstrate that explanation-oriented table-to-text serialization can improve gradient-boosted retail default prediction while simultaneously producing auditor-friendly behavioral summaries.