A Comparative Study of Multi-source Data Fusion Approaches for Credit Default Early Warning

Authors

  • Jiahui Han Master of Finance, MIT Sloan School of Management, MA, USA Author
  • Guanghe Cao Computer Science, University of Southern California, CA, USA Author

DOI:

https://doi.org/10.69987/AIMLR.2024.50109

Keywords:

Credit Default Prediction, Multi-source Data Fusion, Ensemble Learning, Feature Engineering

Abstract

This study presents a comparative analysis of multi-source data fusion approaches for early warning of credit defaults in financial institutions. The research integrates heterogeneous data sources, including credit bureau records, transaction behavior patterns, textual financial reports, and macroeconomic indicators. Three fusion strategies—early fusion, late fusion, and hybrid fusion—are systematically evaluated using ensemble machine learning algorithms, including XGBoost, LightGBM, and Random Forest. Experimental results on a real-world dataset comprising 125,847 credit records demonstrate that the hybrid fusion approach achieves the highest predictive performance with an AUC-ROC of 0.8934, outperforming the best single-source credit-bureau model (AUC-ROC 0.8234) by 7.0 percentage points (8.5% relative improvement). Feature importance analysis using SHAP values indicates that transaction behavior features account for 34.2% of the prediction, whereas NLP-extracted sentiment scores from financial texts account for 18.6%. Statistical tests (e.g., DeLong's test and bootstrap confidence intervals) indicate that the hybrid fusion configuration significantly outperforms the early-fusion baseline (p < 0.001 for AUC).

Author Biography

  • Guanghe Cao, Computer Science, University of Southern California, CA, USA

     

     

Downloads

Published

2024-01-27

How to Cite

Jiahui Han, & Guanghe Cao. (2024). A Comparative Study of Multi-source Data Fusion Approaches for Credit Default Early Warning. Artificial Intelligence and Machine Learning Review , 5(1), 105-116. https://doi.org/10.69987/AIMLR.2024.50109

Share