Developing Evaluation Metrics for Cross-lingual LLM-based Detection of Subtle Sentiment Manipulation in Online Financial Content

Jiayu Liang; Chenyao Zhu; Qichang Zheng; Tianjun Mo

doi:10.69987/JACS.2023.30903

Authors

Jiayu Liang Applied Statistics, Cornell University, NY, USA Author
Chenyao Zhu Industrial Engineering & Operations Research, UC Berkeley, CA, USA Author
Qichang Zheng Computational Social Science, University of Chicago, IL, USA Author
Tianjun Mo Computer Engineering, Duke University, NC, USA Author

DOI:

https://doi.org/10.69987/JACS.2023.30903

Keywords:

Cross-lingual Sentiment Analysis, Financial Manipulation Detection, Evaluation Metrics, Large Language Models

Abstract

This paper addresses the challenge of evaluating cross-lingual Large Language Models (LLMs) for detecting subtle sentiment manipulation in online financial content. While LLMs demonstrate promising capabilities in cross-lingual transfer learning, standard evaluation methods fail to adequately assess their performance in identifying nuanced manipulation techniques across linguistic boundaries. We propose a comprehensive evaluation metrics framework specifically designed for cross-lingual financial sentiment analysis that extends beyond traditional binary classification metrics. The framework incorporates three metric families: linguistic fidelity and cultural context preservation metrics, manipulation detection precision metrics, and cross-lingual transfer efficiency measurements. Experiments conducted across five languages (English, Chinese, Arabic, Spanish, and German) using multiple model architectures demonstrate that Knowledge-Enhanced Adversarial Models outperform traditional approaches by up to 27.3% on manipulation-specific metrics. We developed a multi-layered dataset with 38,252 annotated samples spanning diverse financial domains and manipulation techniques. Our evaluation reveals significant performance variations correlated with linguistic distance and cultural context, with the proposed metrics providing more sensitive assessment of cross-lingual capabilities than traditional measures. This framework enables standardized evaluation of subtle manipulation detection across languages, supporting practical applications in regulatory monitoring, investor protection, and cross-border market surveillance.