Evidence-Grounded RAG for Cloud-Native DevOps: Hallucination-Resistant AIOps Question Answering over Private Operations Documents
DOI:
https://doi.org/10.69987/JACS.2024.40308Keywords:
AIOps, retrieval-augmented generation, DevOps, private operations documents, hallucination, citation precision, cloud-native operations, BM25, dense retrieval, evidence groundingAbstract
Private operations documents are essential in cloud-native DevOps, yet they are also difficult for general-purpose language models to use reliably because the relevant facts are enterprise-specific, acronym-heavy, and often absent from pretraining data. This paper presents an evidence-grounded retrieval-augmented generation (RAG) design for AIOps question answering over private operations documents. We evaluate the design on the public question file of the 2024 CCF International AIOps Challenge dataset, which contains 103 Chinese operations questions mapped to four manual families: RCP, Director, EMSPlus, and uMAC. The evaluation uses the full public question set and a deterministic silver evidence corpus generated from the released question/document metadata; retrieval is performed in a leave-one-question-out protocol so that the question being answered is never retrieved as its own evidence. We compare BM25, dense latent retrieval, hybrid retrieval, a domain-aware reranker, and an evidence-chain RAG variant that selects a supported document family and filters citations to that evidence chain. The empirical results are generated by executable scripts included with this manuscript. Evidence-Chain RAG achieves 92.23% document-level answer correctness, 97.09% Recall@3, 92.23% citation precision, and a 7.77% hallucination rate, reducing hallucination by 27.18 percentage points relative to BM25. The results show that citation filtering and multi-snippet evidence agreement are more important than retrieval recall alone when the objective is trustworthy DevOps assistance. The study provides a compact, reproducible benchmark for grounding-focused AIOps RAG under a sub-300 MB data constraint.







