Improving Classification Accuracy for Unstructured Medical Documents via Multi-Engine OCR and Deep Learning Collaboration

Authors

  • Qiaomu Zhang Computer Science, Rice University, TX, USA Author

DOI:

https://doi.org/10.69987/JACS.2026.60201

Keywords:

Medical document classification, Multi-engine OCR, Ensemble deep learning, Healthcare information extraction

Abstract

The exponential growth of unstructured medical documents poses significant challenges for healthcare information management. This study presents a novel multi-engine collaborative framework integrating diverse optical character recognition (OCR) technologies with ensemble deep learning classifiers to enhance document classification accuracy. The proposed approach adaptively selects optimal OCR engines based on document characteristics, extracts multi-source textual features, and employs confidence-weighted ensemble strategies. An experimental evaluation on a healthcare document dataset achieves 94.7% classification accuracy across clinical notes, diagnostic reports, laboratory results, insurance claims, and prescription forms, outperforming the strongest single-engine baseline (Engine-H) by 11.6 percentage points. The framework maintains an average processing time of 2.4 seconds per document while reducing computational consumption compared with parallel multi-engine execution. These findings validate the effectiveness of multi-engine collaboration for heterogeneous medical documentation systems.

Author Biography

  • Qiaomu Zhang, Computer Science, Rice University, TX, USA

     

     

Downloads

Published

2026-02-03

How to Cite

Qiaomu Zhang. (2026). Improving Classification Accuracy for Unstructured Medical Documents via Multi-Engine OCR and Deep Learning Collaboration. Journal of Advanced Computing Systems , 6(2), 1-14. https://doi.org/10.69987/JACS.2026.60201

Share