Self-Correcting Text-to-SQL Agent with Error Feedback: A Reproducible Closed-Loop Evaluation on Compact Executable SQLite Benchmarks

Siyu Chen; Wenhao Su; Jacob Ma

doi:10.69987/JACS.2024.41107

Authors

Siyu Chen Information Management, UIUC, IL, USA Author
Wenhao Su Computer Science, UCSD, CA, USA Author
Jacob Ma Software Engineering, UC Irvine, CA, USA Author

DOI:

https://doi.org/10.69987/JACS.2024.41107

Keywords:

Text-to-SQL, agentic data access, execution feedback, SQL repair, semantic parsing, SQLite, closed-loop agents, reproducible evaluation

Abstract

Text-to-SQL systems translate natural-language questions into executable database queries, but first-pass predictions often fail because of syntax errors, schema-linking mistakes, missing joins, wrong aggregation, value typos, and semantically valid but wrong column choices. This paper presents EFRA, an error-feedback repair agent that closes the loop between SQL generation, SQLite execution, and deterministic correction. EFRA receives a question, schema, and first-pass SQL candidate; executes the candidate; parses SQLite errors or empty-like results; repairs syntax, table names, column names, join paths, aggregators, predicates, grouping, limits, and values; and repeats the process under a fixed feedback-turn budget. To avoid illustrative results, the study conducts a full empirical evaluation on two compact executable datasets included with the artifact: ExecSpiderLite, a 240-example cross-domain multi-table benchmark, and ExecWikiSQLLite, a 180-example single-table benchmark. The evaluation executes every gold and predicted query over 14 SQLite databases and logs exact match, validity, execution accuracy, number of executions, and latency. EFRA reaches 90.7% execution accuracy and 100.0% validity over all 420 examples, compared with 13.1% execution accuracy for the first-pass generator, 19.8% for a syntax-only guard, and 55.0% for execution-only repair. Ablations show that semantic validators and value repair are both necessary: removing semantic validators reduces execution accuracy to 55.0%, and removing value repair reduces it to 79.3%. The results establish a concise, reproducible baseline for agentic data access in which execution feedback is treated as a first-class signal rather than a post-hoc check.