A Comparative Study on Large Language Models' Accuracy in Cross-lingual Professional Terminology Processing: An Evaluation Across Multiple Domains

Hanlu Zhang; Wenyan Liu

doi:10.69987/

Authors

Hanlu Zhang Master of Science in Computer Science, Stevens Institute of Technology, NJ, USA Author
Wenyan Liu Electrical & Computer Engineering, Carnegie Mellon University, PA, USA Author

DOI:

https://doi.org/10.69987/

Keywords:

Cross-lingual terminology, Large language models, Professional translation, Multilingual evaluation

Abstract

Cross-lingual professional terminology processing presents significant challenges for large language models (LLMs) due to the complexity and domain-specific nature of specialized vocabularies. This study conducts a comprehensive evaluation of five state-of-the-art LLMs across four professional domains: medical, legal, engineering, and financial terminology translation tasks. We developed a multi-domain terminology dataset containing 2,400 professional terms with human-annotated translations in six language pairs. Our experimental framework employs multiple evaluation metrics including BLEU scores, semantic similarity measures, and domain expert assessments. Results reveal substantial performance variations across domains and language pairs, with accuracy ranging from 67.3% to 89.6%. Medical terminology achieved the highest translation accuracy, while legal terminology presented the greatest challenges. Cross-lingual semantic consistency varied significantly between model architectures, with transformer-based models demonstrating superior performance in maintaining semantic integrity. Error pattern analysis identified three primary failure modes: contextual ambiguity resolution, morphological variation handling, and domain-specific concept mapping. These findings provide critical insights for improving LLM performance in specialized translation applications and highlight the need for domain-adaptive training approaches in multilingual terminology processing systems.