Improving Classification Accuracy for Unstructured Medical Documents via Multi-Engine OCR and Deep Learning Collaboration

Qiaomu Zhang

doi:10.69987/JACS.2026.60201

Authors

Qiaomu Zhang Computer Science, Rice University, TX, USA Author

DOI:

https://doi.org/10.69987/JACS.2026.60201

Keywords:

Medical document classification, Multi-engine OCR, Ensemble deep learning, Healthcare information extraction

Abstract

The exponential growth of unstructured medical documents poses significant challenges for healthcare information management. This study presents a novel multi-engine collaborative framework integrating diverse optical character recognition (OCR) technologies with ensemble deep learning classifiers to enhance document classification accuracy. The proposed approach adaptively selects optimal OCR engines based on document characteristics, extracts multi-source textual features, and employs confidence-weighted ensemble strategies. An experimental evaluation on a healthcare document dataset achieves 94.7% classification accuracy across clinical notes, diagnostic reports, laboratory results, insurance claims, and prescription forms, outperforming the strongest single-engine baseline (Engine-H) by 11.6 percentage points. The framework maintains an average processing time of 2.4 seconds per document while reducing computational consumption compared with parallel multi-engine execution. These findings validate the effectiveness of multi-engine collaboration for heterogeneous medical documentation systems.