English

Controllable Data Augmentation for Context-Dependent Text-to-SQL

Computation and Language 2023-05-01 v2

Abstract

The limited scale of annotated data constraints existing context-dependent text-to-SQL models because of the complexity of labeling. The data augmentation method is a commonly used method to solve this problem. However, the data generated by current augmentation methods often lack diversity. In this paper, we introduce ConDA, which generates interactive questions and corresponding SQL results. We designed the SQL dialogue state to enhance the data diversity through the state transition. Meanwhile, we also present a filter method to ensure the data quality by a grounding model. Additionally, we utilize a grounding model to identify and filter low-quality questions that mismatch the state information. Experimental results on the SParC and CoSQL datasets show that ConDA boosts the baseline model to achieve an average improvement of 3.3%3.3\% on complex questions. Moreover, we analyze the augmented data, which reveals that the data generated by ConDA are of high quality in both SQL template hardness and types, turns, and question consistency.

Keywords

Cite

@article{arxiv.2304.13902,
  title  = {Controllable Data Augmentation for Context-Dependent Text-to-SQL},
  author = {Dingzirui Wang and Longxu Dou and Wanxiang Che},
  journal= {arXiv preprint arXiv:2304.13902},
  year   = {2023}
}

Comments

fix overlap

R2 v1 2026-06-28T10:19:12.598Z