An Empirical Evaluation of Oversampling-Ensemble Interactions Under Varying Imbalance Ratios for Tabular Data Classification

Wenlan Wei; Zhengchun Shang

doi:10.69987/AIMLR.2026.70205

Authors

Wenlan Wei Computer Science, Cornell University, Ithaca, NY, USA Author
Zhengchun Shang Electrical and Computer Engineering, Cornell University, Ithaca, NY, USA Author

DOI:

https://doi.org/10.69987/AIMLR.2026.70205

Keywords:

class imbalance, oversampling, ensemble learning, tabular data, imbalance ratio

Abstract

Class imbalance represents one of the most persistent and practically consequential challenges in supervised machine learning, arising whenever the minority class—typically the class of primary interest—constitutes only a small fraction of the total training population. Existing research has examined oversampling techniques and ensemble methods as independent remedies, yet their interaction under systematically varying imbalance severity remains insufficiently characterized. This study presents a structured empirical evaluation of six oversampling strategies—SMOTE, Borderline-SMOTE, ADASYN, K-Means SMOTE, SMOTEENN, and SVMSMOTE—combined with four ensemble classifiers—Random Forest, XGBoost, LightGBM, and EasyEnsemble—across ten publicly available tabular benchmark datasets drawn from the KEEL repository, the UCI Machine Learning Repository, and Kaggle. Datasets are partitioned into four imbalance ratio groups: low (IR < 5), medium (5–10), high (10–50), and extreme (IR > 50). Performance is assessed using F1-score, G-mean, AUC-ROC, and AUPRC under stratified ten-fold cross-validation. Experimental results reveal that no single oversampling-ensemble pairing dominates uniformly across all imbalance levels. SMOTEENN combined with EasyEnsemble achieves the strongest overall performance in high and extreme imbalance scenarios, while classifier reweighting without explicit oversampling proves adequate under low imbalance. These findings offer practical guidance for practitioners selecting preprocessing-ensemble pipelines commensurate with observed dataset imbalance severity.

Author Biography

Zhengchun Shang, Electrical and Computer Engineering, Cornell University, Ithaca, NY, USA

An Empirical Evaluation of Oversampling-Ensemble Interactions Under Varying Imbalance Ratios for Tabular Data Classification

Authors

DOI:

Keywords:

Abstract

Author Biography

Downloads

Published

Issue

Section

License

How to Cite

Share

Final Sidebar