An Empirical Comparison of Feature Engineering Strategies from Non-Traditional Data for Thin-File Borrower Credit Assessment

Authors

  • Zhi Luo Business Analytics, Columbia University, NY, USA Author
  • Mingzhuo Yu Computer Science, Northeastern University, MA, USA Author

DOI:

https://doi.org/10.69987/JACS.2026.60404

Keywords:

alternative data, feature engineering, thin-file credit scoring, financial inclusion

Abstract

Approximately 45 million adults in the United States lack sufficient credit history for conventional scoring, limiting their access to fair lending opportunities. Non-traditional data sources—including behavioral payment patterns, temporal transaction sequences, and relational signals—present promising avenues for assessing these thin-file borrowers, yet the relative predictive contribution of each feature category remains unclear. This study conducts a systematic empirical comparison of feature engineering strategies derived from non-traditional data on the Home Credit Default Risk dataset (307,511 applications across seven linked tables). We define a taxonomy of three feature categories—behavioral, temporal, and relational—and evaluate each through ablation analysis on thin-file and thick-file borrower segments using LightGBM. Results indicate that behavioral features yield the largest marginal AUC-ROC improvement (+0.0472) for thin-file borrowers, exceeding the corresponding gain for thick-file borrowers (+0.0212) by a factor of 2.23. The combined non-traditional feature set raises thin-file AUC-ROC from 0.6651 to 0.7408, narrowing the performance gap relative to thick-file borrowers by 38.2%. Fairness analysis reveals that behavioral and temporal features modestly reduce equalized odds disparities across gender and age groups, while relational features introduce slight increases in demographic gaps. These findings provide actionable guidance for lenders seeking to expand credit access through responsible alternative data utilization.

Author Biography

  • Mingzhuo Yu, Computer Science, Northeastern University, MA, USA

     

     

Downloads

Published

2026-04-11

How to Cite

Zhi Luo, & Mingzhuo Yu. (2026). An Empirical Comparison of Feature Engineering Strategies from Non-Traditional Data for Thin-File Borrower Credit Assessment. Journal of Advanced Computing Systems , 6(4), 50-59. https://doi.org/10.69987/JACS.2026.60404

Share