Muppet: Massive Multi-task Representations with Pre-Finetuning

Armen Aghajanyan; Anchit Gupta; Akshat Shrivastava; Xilun Chen; Luke Zettlemoyer; Sonal Gupta

Muppet: Massive Multi-task Representations with Pre-Finetuning

Computation and Language 2021-01-28 v1 Machine Learning

Authors: Armen Aghajanyan , Anchit Gupta , Akshat Shrivastava , Xilun Chen , Luke Zettlemoyer , Sonal Gupta

Abstract

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g.~RoBERTa) and generation models (e.g.~BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

Keywords

pre-trained language model instruction tuning multi-task learning

Cite

@article{arxiv.2101.11038,
  title  = {Muppet: Massive Multi-task Representations with Pre-Finetuning},
  author = {Armen Aghajanyan and Anchit Gupta and Akshat Shrivastava and Xilun Chen and Luke Zettlemoyer and Sonal Gupta},
  journal= {arXiv preprint arXiv:2101.11038},
  year   = {2021}
}

Muppet: Massive Multi-task Representations with Pre-Finetuning

Abstract

Keywords

Cite

Related papers