Token Turing Machines

Michael S. Ryoo; Keerthana Gopalakrishnan; Kumara Kahatapitiya; Ted Xiao; Kanishka Rao; Austin Stone; Yao Lu; Julian Ibarz; Anurag Arnab

Token Turing Machines

Machine Learning 2023-04-14 v2 Computer Vision and Pattern Recognition Robotics

Authors: Michael S. Ryoo , Keerthana Gopalakrishnan , Kumara Kahatapitiya , Ted Xiao , Kanishka Rao , Austin Stone , Yao Lu , Julian Ibarz , Anurag Arnab

View on arXiv ↗ PDF ↗

Abstract

We propose Token Turing Machines (TTM), a sequential, autoregressive Transformer model with memory for real-world sequential visual understanding. Our model is inspired by the seminal Neural Turing Machine, and has an external memory consisting of a set of tokens which summarise the previous history (i.e., frames). This memory is efficiently addressed, read and written using a Transformer as the processing unit/controller at each step. The model's memory module ensures that a new observation will only be processed with the contents of the memory (and not the entire history), meaning that it can efficiently process long sequences with a bounded computational cost at each step. We show that TTM outperforms other alternatives, such as other Transformer models designed for long sequences and recurrent neural networks, on two real-world sequential visual understanding tasks: online temporal activity detection from videos and vision-based robot action policy learning. Code is publicly available at: https://github.com/google-research/scenic/tree/main/scenic/projects/token_turing

Keywords

long short-term memory vision transformer transformer

Cite

@article{arxiv.2211.09119,
  title  = {Token Turing Machines},
  author = {Michael S. Ryoo and Keerthana Gopalakrishnan and Kumara Kahatapitiya and Ted Xiao and Kanishka Rao and Austin Stone and Yao Lu and Julian Ibarz and Anurag Arnab},
  journal= {arXiv preprint arXiv:2211.09119},
  year   = {2023}
}

Comments

CVPR 2023 camera-ready copy

Token Turing Machines

Abstract

Keywords

Cite

Comments

Related papers