Risk-Calibrated Biomedical Search: Calibrated Selection of LLM-Style Query Expansions on BEIR TREC-COVID
DOI:
https://doi.org/10.69987/JACS.2024.40406Keywords:
query expansion, uncertainty calibration, robust retrieval, selective prediction, biomedical information retrieval, TREC-COVID, BEIR, coverage–risk trade-offAbstract
Query expansion is a long-standing technique for closing vocabulary gaps between short user queries and long biomedical documents. Large language models (LLMs) have recently renewed interest in expansion by generating fluent synonym lists, MeSH-style descriptors, and drug aliases; however, aggressive generation can introduce query drift, causing large per-topic failures that are unacceptable in high-stakes biomedical search. This paper presents Risk-Calibrated Query Expansion (RCQE), a selective expansion framework that treats expansion as a risk-aware decision: for each query we generate multiple plausible expansion candidates and learn a calibrated selector that either (i) chooses a candidate expected to improve retrieval, or (ii) abstains and keeps the original query. We conduct full experiments on BEIR TREC-COVID (171,332 documents; 50 topics; 66,336 judged query-document pairs) using a reproducible BM25 implementation. Across topics, a naive always-expand strategy improves average nDCG@10 from 0.549 to 0.580 but harms 20% of topics, including catastrophic failures. RCQE improves average nDCG@10 to 0.613 and MAP to 0.213 under 5-fold cross-validation while reducing the conditional harm probability among expanded topics from 0.20 to 0.13 at 46% coverage. Coverage–risk curves show that tightening the calibrated acceptance threshold yields monotonic risk reductions with graceful degradation in effectiveness. These results demonstrate that uncertainty calibration is a practical control knob for robust biomedical query expansion.







