Execution-Feedback and Retrieval-Augmented Generation for Conversational Text-to-SQL: From One-Shot Questions to Clarification-Driven Executable Dialogs

Yunhe Li

doi:10.69987/JACS.2023.30201

Authors

Yunhe Li Computer and Information Technology University of Pennsylvania, PA, USA Author

DOI:

https://doi.org/10.69987/JACS.2023.30201

Keywords:

Conversational BI, execution feedback, schema linking, multiturn semantic parsing

Abstract

Conversational business intelligence (BI) requires systems that transform multi-turn user requests into executable database queries while recovering from ambiguity, schema mismatch, and SQL runtime errors. Prior text-to-SQL benchmarks such as Spider, SParC, and CoSQL demonstrate that strong single-turn parsers degrade sharply when context must be carried across turns and when the system must ask for missing constraints. This paper studies a practical pipeline that combines (i) schema linking and retrieval-augmented generation (RAG) over schema snippets and exemplar queries, and (ii) an execution-feedback loop that executes candidate SQL, parses runtime feedback, and repairs queries or elicits clarification. To ensure full reproducibility of end-to-end experiments, we construct three format-preserving benchmarks—ProxySpider, ProxySParC, and ProxyCoSQL—whose schemas, SQL templates, and dialog phenomena are consistent with the original datasets’ cross-domain and multi-turn design. We evaluate three paradigms: a prompt-only baseline that maps each user turn independently, a Schema-RAG system that performs schema-aware retrieval and dialog-state grounding, and an Exec-Feedback system that iteratively repairs SQL using execution errors. Across 30 SQLite databases in 10 domains, Exec-Feedback improves execution accuracy over Schema-RAG on all three benchmarks and yields the highest dialog-level success on ProxySParC and ProxyCoSQL. Detailed analyses quantify how execution feedback shifts the error profile from schema/syntax failures toward residual semantic and context errors, and how RAG and dialog-state grounding interact to sustain accuracy as conversations lengthen.