An Empirical Comparison of Feature Engineering Strategies from Non-Traditional Data for Thin-File Borrower Credit Assessment
DOI:
https://doi.org/10.69987/JACS.2026.60404Keywords:
alternative data, feature engineering, thin-file credit scoring, financial inclusionAbstract
Approximately 45 million adults in the United States lack sufficient credit history for conventional scoring, limiting their access to fair lending opportunities. Non-traditional data sources—including behavioral payment patterns, temporal transaction sequences, and relational signals—present promising avenues for assessing these thin-file borrowers, yet the relative predictive contribution of each feature category remains unclear. This study conducts a systematic empirical comparison of feature engineering strategies derived from non-traditional data on the Home Credit Default Risk dataset (307,511 applications across seven linked tables). We define a taxonomy of three feature categories—behavioral, temporal, and relational—and evaluate each through ablation analysis on thin-file and thick-file borrower segments using LightGBM. Results indicate that behavioral features yield the largest marginal AUC-ROC improvement (+0.0472) for thin-file borrowers, exceeding the corresponding gain for thick-file borrowers (+0.0212) by a factor of 2.23. The combined non-traditional feature set raises thin-file AUC-ROC from 0.6651 to 0.7408, narrowing the performance gap relative to thick-file borrowers by 38.2%. Fairness analysis reveals that behavioral and temporal features modestly reduce equalized odds disparities across gender and age groups, while relational features introduce slight increases in demographic gaps. These findings provide actionable guidance for lenders seeking to expand credit access through responsible alternative data utilization.







