English

Data Augmentation for Intent Classification

Computation and Language 2022-06-14 v1

Abstract

Training accurate intent classifiers requires labeled data, which can be costly to obtain. Data augmentation methods may ameliorate this issue, but the quality of the generated data varies significantly across techniques. We study the process of systematically producing pseudo-labeled data given a small seed set using a wide variety of data augmentation techniques, including mixing methods together. We find that while certain methods dramatically improve qualitative and quantitative performance, other methods have minimal or even negative impact. We also analyze key considerations when implementing data augmentation methods in production.

Keywords

Cite

@article{arxiv.2206.05790,
  title  = {Data Augmentation for Intent Classification},
  author = {Derek Chen and Claire Yin},
  journal= {arXiv preprint arXiv:2206.05790},
  year   = {2022}
}

Comments

8 pages, 3 tables. Accepted to NeurIPs 2021 Data-centric AI Workshop