Compressive Transformers for Long-Range Sequence Modelling

Jack W. Rae; Anna Potapenko; Siddhant M. Jayakumar; Timothy P. Lillicrap

Compressive Transformers for Long-Range Sequence Modelling

Machine Learning 2019-11-14 v1 Machine Learning

Authors: Jack W. Rae , Anna Potapenko , Siddhant M. Jayakumar , Timothy P. Lillicrap

Abstract

We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory mechanism for RL, demonstrated on an object matching task. To promote the domain of long-range sequence learning, we propose a new open-vocabulary language modelling benchmark derived from books, PG-19.

Keywords

model transformation speech recognition transformer

Cite

@article{arxiv.1911.05507,
  title  = {Compressive Transformers for Long-Range Sequence Modelling},
  author = {Jack W. Rae and Anna Potapenko and Siddhant M. Jayakumar and Timothy P. Lillicrap},
  journal= {arXiv preprint arXiv:1911.05507},
  year   = {2019}
}

Comments

19 pages, 6 figures, 10 tables

Compressive Transformers for Long-Range Sequence Modelling

Abstract

Keywords

Cite

Comments

Related papers