Comparative Analysis of Filter-based Feature Selection Methods for High-Dimensional Data in Classification Tasks

Authors

  • Shengjie Min Statistics, The University of Georgia, GA, USA Author
  • Chuanli Wei Computer Science, University of Southern California, CA, USA Author

DOI:

https://doi.org/10.69987/JACS.2023.30803

Keywords:

Feature selection, high-dimensional data classification, feature filtering methods, dimensionality reduction

Abstract

High-dimensional data classification encounters substantial computational barriers when feature spaces exceed sample sizes by orders of magnitude. Filter-based feature selection addresses this dimensionality curse through statistical independence between feature evaluation and classifier training stages. This study examines six prevalent feature filtering methods across datasets ranging from 10³ to 10⁵ dimensions, measuring their impact on classification accuracy, computational overhead, and feature subset stability. Experimental results demonstrate that correlation-based approaches achieve 8.7% higher accuracy than variance thresholding on bioinformatics datasets while maintaining O (n log n) time complexity. Chi-square statistics test and mutual information methods exhibit comparable performance on categorical data with divergent behavior on continuous features. The analysis reveals trade-offs between its statistical power and computational tractability, with F-score emerging as optimal for balanced datasets and ReliefF excelling under class imbalance conditions. Performance degradation appears beyond 10⁴ features for correlation methods due to spurious associations, suggesting hybrid architectures for ultra-high-dimensional data processing problems.

Author Biography

  • Chuanli Wei, Computer Science, University of Southern California, CA, USA

     

     

Downloads

Published

2023-08-09

How to Cite

Shengjie Min, & Chuanli Wei. (2023). Comparative Analysis of Filter-based Feature Selection Methods for High-Dimensional Data in Classification Tasks. Journal of Advanced Computing Systems , 3(8), 25-38. https://doi.org/10.69987/JACS.2023.30803

Share