English

Formal Algorithms for Transformers

Machine Learning 2022-07-26 v1 Artificial Intelligence Computation and Language Neural and Evolutionary Computing

Abstract

This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures such as MLPs.

Keywords

Cite

@article{arxiv.2207.09238,
  title  = {Formal Algorithms for Transformers},
  author = {Mary Phuong and Marcus Hutter},
  journal= {arXiv preprint arXiv:2207.09238},
  year   = {2022}
}

Comments

16 pages, 15 algorithms