Risk-Calibrated Biomedical Search: Calibrated Selection of LLM-Style Query Expansions on BEIR TREC-COVID

Authors

  • Jing Chen Industrial Engineering and Operations Research, UCB, CA, USA Author
  • Xinzhuo Sun Computer Science, Cornell Tech, NY, USA Author
  • Qiyou Wu Artificial Intelligence, Northeastern University, MA, USA Author
  • Matt Jackson Data Science, University of Pittsburgh, PA, USA Author

DOI:

https://doi.org/10.69987/JACS.2024.40406

Keywords:

query expansion, uncertainty calibration, robust retrieval, selective prediction, biomedical information retrieval, TREC-COVID, BEIR, coverage–risk trade-off

Abstract

Query expansion is a long-standing technique for closing vocabulary gaps between short user queries and long biomedical documents. Large language models (LLMs) have recently renewed interest in expansion by generating fluent synonym lists, MeSH-style descriptors, and drug aliases; however, aggressive generation can introduce query drift, causing large per-topic failures that are unacceptable in high-stakes biomedical search. This paper presents Risk-Calibrated Query Expansion (RCQE), a selective expansion framework that treats expansion as a risk-aware decision: for each query we generate multiple plausible expansion candidates and learn a calibrated selector that either (i) chooses a candidate expected to improve retrieval, or (ii) abstains and keeps the original query. We conduct full experiments on BEIR TREC-COVID (171,332 documents; 50 topics; 66,336 judged query-document pairs) using a reproducible BM25 implementation. Across topics, a naive always-expand strategy improves average nDCG@10 from 0.549 to 0.580 but harms 20% of topics, including catastrophic failures. RCQE improves average nDCG@10 to 0.613 and MAP to 0.213 under 5-fold cross-validation while reducing the conditional harm probability among expanded topics from 0.20 to 0.13 at 46% coverage. Coverage–risk curves show that tightening the calibrated acceptance threshold yields monotonic risk reductions with graceful degradation in effectiveness. These results demonstrate that uncertainty calibration is a practical control knob for robust biomedical query expansion.

Author Biography

  • Matt Jackson, Data Science, University of Pittsburgh, PA, USA

     

     

     

Downloads

Published

2024-04-18

How to Cite

Jing Chen, Xinzhuo Sun, Qiyou Wu, & Matt Jackson. (2024). Risk-Calibrated Biomedical Search: Calibrated Selection of LLM-Style Query Expansions on BEIR TREC-COVID. Journal of Advanced Computing Systems , 4(4), 61-79. https://doi.org/10.69987/JACS.2024.40406

Share