Execution-Validated Program-Supervised Complex KBQA: A Reproducible 120K-Question Study with KoPL-Style Programs

Xiaofei Luo

doi:10.69987/JACS.2024.40604

Authors

Xiaofei Luo Information Science, University of Illinois at Urbana-Champaign, IL, US Author

DOI:

https://doi.org/10.69987/JACS.2024.40604

Keywords:

complex KBQA, semantic parsing, neural-symbolic reasoning, program supervision, KoPL, SPARQL, execution-guided decoding, constrained decoding, interpretability

Abstract

Program-supervised complex knowledge base question answering (KBQA) converts a natural-language question into an executable program (e.g., KoPL or SPARQL) and then executes that program on a knowledge base (KB) to obtain the answer, yielding both strong neural-symbolic performance and step-by-step interpretability. This paper reports a fully reproducible end-to-end study of this paradigm with explicit program supervision and execution-validated decoding. We construct SynKQA-Pro, a self-contained benchmark that follows the data format and supervision signals of KQA Pro—including 120K multiple-choice questions, gold KoPL-style programs, gold SPARQL queries, hop annotations, and reasoning-type tags—while remaining fully executable without external KB dependencies. We evaluate a supervised program parser that predicts a program template and fills its entity slots, and an execution-validated variant that reranks top-k candidate programs by executing them and selecting the first candidate that passes an answer-consistency check. On the SynKQA-Pro test set, the template classifier achieves 43.27% answer accuracy and 43.27% program exact match, whereas execution validation improves performance to 98.85% answer accuracy and 98.66% program exact match with an average of 2.47 executed candidate programs per question. Error analysis shows that execution validation reduces wrong-template failures from 5673 cases to 115 residual spurious-pass errors. The reported results, figures, and tables are produced directly from the released implementation with fixed random seeds.