English

Self-supervised Regularization for Text Classification

Computation and Language 2021-03-25 v2 Machine Learning

Abstract

Text classification is a widely studied problem and has broad applications. In many real-world problems, the number of texts for training classification models is limited, which renders these models prone to overfitting. To address this problem, we propose SSL-Reg, a data-dependent regularization approach based on self-supervised learning (SSL). SSL is an unsupervised learning approach which defines auxiliary tasks on input data without using any human-provided labels and learns data representations by solving these auxiliary tasks. In SSL-Reg, a supervised classification task and an unsupervised SSL task are performed simultaneously. The SSL task is unsupervised, which is defined purely on input texts without using any human-provided labels. Training a model using an SSL task can prevent the model from being overfitted to a limited number of class labels in the classification task. Experiments on 17 text classification datasets demonstrate the effectiveness of our proposed method.

Keywords

Cite

@article{arxiv.2103.05231,
  title  = {Self-supervised Regularization for Text Classification},
  author = {Meng Zhou and Zechen Li and Pengtao Xie},
  journal= {arXiv preprint arXiv:2103.05231},
  year   = {2021}
}

Comments

16 pages, 3 figures, to be published in Transactions of the Association for Computational Linguistics