Improving Classification Accuracy for Unstructured Medical Documents via Multi-Engine OCR and Deep Learning Collaboration
DOI:
https://doi.org/10.69987/JACS.2026.60201Keywords:
Medical document classification, Multi-engine OCR, Ensemble deep learning, Healthcare information extractionAbstract
The exponential growth of unstructured medical documents poses significant challenges for healthcare information management. This study presents a novel multi-engine collaborative framework integrating diverse optical character recognition (OCR) technologies with ensemble deep learning classifiers to enhance document classification accuracy. The proposed approach adaptively selects optimal OCR engines based on document characteristics, extracts multi-source textual features, and employs confidence-weighted ensemble strategies. An experimental evaluation on a healthcare document dataset achieves 94.7% classification accuracy across clinical notes, diagnostic reports, laboratory results, insurance claims, and prescription forms, outperforming the strongest single-engine baseline (Engine-H) by 11.6 percentage points. The framework maintains an average processing time of 2.4 seconds per document while reducing computational consumption compared with parallel multi-engine execution. These findings validate the effectiveness of multi-engine collaboration for heterogeneous medical documentation systems.







