Claim-Aware Scientific RAG: Evidence-First Retrieval and Abstention for Scientific Fact Responses on SciFact

Jing Chen; Xinzhuo Sun; Vincent Brown

doi:10.69987/JACS.2023.30102

Authors

Jing Chen Industrial Engineering and Operations Research, UCB, CA, USA Author
Xinzhuo Sun Computer Science, Cornell Tech, NY, USA Author
Vincent Brown Information Technology, Illinois Tech, IL, USA Author

DOI:

https://doi.org/10.69987/JACS.2023.30102

Keywords:

RAG, fact verification, evidence-grounded generation, hallucination reduction, SciFact, BEIR, abstention, reranking, hybrid retrieval

Abstract

Retrieval-augmented generation (RAG) is widely adopted to reduce hallucinations, yet most systems still answer even when retrieval fails, producing fluent but unsupported “scientific facts”. This paper studies a claim-aware scientific RAG design principle: the system is allowed to answer only when it can cite evidence. We conduct full experimental evaluations on the SciFact scientific claim retrieval task using the BEIR-style SciFact split (5,183 abstracts; 809 training claims; 300 test claims). We compare a sparse BM25 retriever, a contrastive dense dual-encoder, and a hybrid retriever using reciprocal rank fusion (RRF), followed by an interaction-based reranker. We then add an evidence layer that extracts candidate citation sentences and scores them with a lightweight verifier, and we enforce an abstention gate that refuses to answer when confidence is low. On the SciFact test set, BM25 achieves nDCG@10=0.662 and Recall@100=0.883. The dense retriever alone underperforms (nDCG@10=0.537), but hybrid RRF improves Recall@100 to 0.923 and a reranker recovers nDCG@10 to 0.659. For evidence extraction, token-level evidence F1 reaches 0.190 when selecting two sentences. Finally, we quantify a refusal–hallucination tradeoff via confidence-based abstention: gating by the top-1 BM25 score reduces the rate of answers without any relevant abstract in the top-10 from 0.193 to 0.047 at 28.3% answer coverage. These results provide a reproducible baseline showing how evidence-first retrieval and calibrated refusal can be combined to control hallucinations in scientific RAG.

Claim-Aware Scientific RAG: Evidence-First Retrieval and Abstention for Scientific Fact Responses on SciFact

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Share

n

License