[1]

Thomas Reed and George Mason, “Hallucination Detection and Confidence Calibration for Large Language Model Outputs: Reproducible Experiments on HaluEval”, AIMLR, vol. 6, no. 4, pp. 1–17, Oct. 2025, doi: 10.69987/AIMLR.2025.60401.