Few-shot Sequence Learning with Transformers

Lajanugen Logeswaran; Ann Lee; Myle Ott; Honglak Lee; Marc'Aurelio Ranzato; Arthur Szlam

Few-shot Sequence Learning with Transformers

Machine Learning 2020-12-18 v1

Authors: Lajanugen Logeswaran , Ann Lee , Myle Ott , Honglak Lee , Marc'Aurelio Ranzato , Arthur Szlam

Abstract

Few-shot algorithms aim at learning new tasks provided only a handful of training examples. In this work we investigate few-shot learning in the setting where the data points are sequences of tokens and propose an efficient learning algorithm based on Transformers. In the simplest setting, we append a token to an input sequence which represents the particular task to be undertaken, and show that the embedding of this token can be optimized on the fly given few labeled examples. Our approach does not require complicated changes to the model architecture such as adapter layers nor computing second order derivatives as is currently popular in the meta-learning and few-shot learning literature. We demonstrate our approach on a variety of tasks, and analyze the generalization properties of several model variants and baseline approaches. In particular, we show that compositional task descriptors can improve performance. Experiments show that our approach works at least as well as other methods, while being more computationally efficient.

Keywords

few-shot learning transformer machine learning theory

Cite

@article{arxiv.2012.09543,
  title  = {Few-shot Sequence Learning with Transformers},
  author = {Lajanugen Logeswaran and Ann Lee and Myle Ott and Honglak Lee and Marc'Aurelio Ranzato and Arthur Szlam},
  journal= {arXiv preprint arXiv:2012.09543},
  year   = {2020}
}

Comments

NeurIPS Meta-Learning Workshop 2020

Few-shot Sequence Learning with Transformers

Abstract

Keywords

Cite

Comments

Related papers