Evidence-Chain Reliable RAG: Word-Level Hallucination Detection, Source Attribution, and Provenance Explanation for LLM Applications
DOI:
https://doi.org/10.69987/JACS.2024.40207Keywords:
retrieval-augmented generation, hallucination detection, word-level annotation, source attribution, provenance explanation, evidence chain, reliable language modelsAbstract
Retrieval-augmented generation (RAG) reduces unsupported generation by conditioning a language model on external evidence, yet generated answers can still contain factual claims that are absent from or contradicted by the retrieved source. This paper presents an evidence-chain detector for reliable RAG applications. The detector aligns a generated answer with retrieved passages or structured records, computes lexical, sparse retrieval, and rule-based entailment features, and predicts hallucination at word and sentence level while returning the evidence chunk used for attribution. To avoid illustrative reporting, all numerical results in this manuscript were produced by the packaged experiment script with random seed 17. The execution artifact contains a RAGTruth-compatible audit corpus with 140 source items, six generated-style responses per source, 840 responses, 2,250 sentences, and 23,284 evaluated word tokens. The test split contains 252 responses and 286 hallucinated tokens. EvidenceChain-RF achieved the strongest measured word-level score, with precision 0.468, recall 0.654, F1 0.545, ROC-AUC 0.919, and PR-AUC 0.357. At sentence level, the same detector reached precision 0.952, recall 1.000, and F1 0.975. The results show that explicit evidence matching, mismatch features, and supervised calibration provide more reliable span detection than lexical overlap alone. The package also includes an official-schema loader for response.jsonl and source_info.jsonl, allowing the same code to rerun on the complete public RAGTruth release when those files are available in the execution environment.







