English

SemanticAgent: A Semantics-Aware Framework for Text-to-SQL Data Synthesis

Artificial Intelligence 2026-04-24 v1

Abstract

Existing text-to-SQL synthesis pipelines still conflate executability with semantic validity: syntactic checks and execution-based validation can retain queries that execute successfully while violating database semantics. To address these limitations, we propose SemanticAgent, a semantic-aware synthesis framework. SemanticAgent organizes synthesis around three specialized modules: an analyzer, a synthesizer, and a verifier. Through a three-stage protocol of semantic analysis, stepwise synthesis, and diagnostic refinement, SemanticAgent transforms execution-based validation alone into a traceable reasoning process. Our framework generates synthetic data that consistently outperforms prior synthesis methods under semantic-quality evaluation, leading to stronger downstream fine-tuning performance, especially on semantically demanding benchmarks.

Keywords

Cite

@article{arxiv.2604.21414,
  title  = {SemanticAgent: A Semantics-Aware Framework for Text-to-SQL Data Synthesis},
  author = {Qiang Gao and Zhenping Li and Anqi Zhuo and Yingxiao Zhao and Weibo Geng and Xiaosong Li},
  journal= {arXiv preprint arXiv:2604.21414},
  year   = {2026}
}