Related papers: Formal Algorithms for Transformers

Introduction to Transformers: an NLP Perspective

Transformers have dominated empirical machine learning models of natural language processing. In this paper, we introduce basic concepts of Transformers and present key techniques that form the recent advances of these models. This includes…

Computation and Language · Computer Science 2023-11-30 Tong Xiao , Jingbo Zhu

An Introduction to Transformers

The transformer is a neural network component that can be used to learn useful representations of sequences or sets of data-points. The transformer has driven recent advances in natural language processing, computer vision, and…

Machine Learning · Computer Science 2026-01-21 Richard E. Turner

Transformadores: Fundamentos teoricos y Aplicaciones

Transformers are a neural network architecture originally developed for natural language processing, which have since become a foundational tool for solving a wide range of problems, including text, audio, image processing, reinforcement…

Computation and Language · Computer Science 2025-05-06 Jordi de la Torre

Advances in Transformers for Robotic Applications: A Review

The introduction of Transformers architecture has brought about significant breakthroughs in Deep Learning (DL), particularly within Natural Language Processing (NLP). Since their inception, Transformers have outperformed many traditional…

Robotics · Computer Science 2024-12-17 Nikunj Sanghai , Nik Bear Brown

Transformers in Time-series Analysis: A Tutorial

Transformer architecture has widespread applications, particularly in Natural Language Processing and computer vision. Recently Transformers have been employed in various aspects of time-series analysis. This tutorial provides an overview…

Machine Learning · Computer Science 2023-07-27 Sabeen Ahmed , Ian E. Nielsen , Aakash Tripathi , Shamoon Siddiqui , Ghulam Rasool , Ravi P. Ramachandran

A Survey on Transformers in Reinforcement Learning

Transformer has been considered the dominating neural architecture in NLP and CV, mostly under supervised settings. Recently, a similar surge of using Transformers has appeared in the domain of reinforcement learning (RL), but it is faced…

Machine Learning · Computer Science 2023-09-22 Wenzhe Li , Hao Luo , Zichuan Lin , Chongjie Zhang , Zongqing Lu , Deheng Ye

Introduction to Sequence Modeling with Transformers

Understanding the transformer architecture and its workings is essential for machine learning (ML) engineers. However, truly understanding the transformer architecture can be demanding, even if you have a solid background in machine…

Machine Learning · Computer Science 2025-02-28 Joni-Kristian Kämäräinen

Can Transformers Learn $n$-gram Language Models?

Much theoretical work has described the ability of transformers to represent formal languages. However, linking theoretical results to empirical performance is not straightforward due to the complex interplay between the architecture, the…

Computation and Language · Computer Science 2024-10-07 Anej Svete , Nadav Borenstein , Mike Zhou , Isabelle Augenstein , Ryan Cotterell

Anatomy of Neural Language Models

The fields of generative AI and transfer learning have experienced remarkable advancements in recent years especially in the domain of Natural Language Processing (NLP). Transformers have been at the heart of these advancements where the…

Computation and Language · Computer Science 2024-02-28 Majd Saleh , Stéphane Paquelet

Transformers are Graph Neural Networks

We establish connections between the Transformer architecture, originally introduced for natural language processing, and Graph Neural Networks (GNNs) for representation learning on graphs. We show how Transformers can be viewed as message…

Machine Learning · Computer Science 2025-06-30 Chaitanya K. Joshi

Transformers in Healthcare: A Survey

With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep…

Artificial Intelligence · Computer Science 2024-11-25 Subhash Nerella , Sabyasachi Bandyopadhyay , Jiaqing Zhang , Miguel Contreras , Scott Siegel , Aysegul Bumin , Brandon Silva , Jessica Sena , Benjamin Shickel , Azra Bihorac , Kia Khezeli , Parisa Rashidi

Understanding Transformers and Attention Mechanisms: An Introduction for Applied Mathematicians

This document provides a brief introduction to the attention mechanism used in modern language models based on the Transformer architecture. We first illustrate how text is encoded as vectors and how the attention mechanism processes these…

Numerical Analysis · Mathematics 2026-04-02 Michel Fabrice Serret

Transformers in Reinforcement Learning: A Survey

Transformers have significantly impacted domains like natural language processing, computer vision, and robotics, where they improve performance compared to other neural networks. This survey explores how transformers are used in…

Machine Learning · Computer Science 2023-07-13 Pranav Agarwal , Aamer Abdul Rahman , Pierre-Luc St-Charles , Simon J. D. Prince , Samira Ebrahimi Kahou

Efficient Transformers: A Survey

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example,…

Machine Learning · Computer Science 2022-03-15 Yi Tay , Mostafa Dehghani , Dara Bahri , Donald Metzler

Transformers in Medical Image Analysis: A Review

Transformers have dominated the field of natural language processing, and recently impacted the computer vision area. In the field of medical image analysis, Transformers have also been successfully applied to full-stack clinical…

Computer Vision and Pattern Recognition · Computer Science 2022-08-22 Kelei He , Chen Gan , Zhuoyuan Li , Islem Rekik , Zihao Yin , Wen Ji , Yang Gao , Qian Wang , Junfeng Zhang , Dinggang Shen

To Transformers and Beyond: Large Language Models for the Genome

In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool for tackling complex computational challenges. This review focuses on the transformative role of Large Language Models (LLMs), which are mostly based…

Genomics · Quantitative Biology 2023-11-15 Micaela E. Consens , Cameron Dufault , Michael Wainberg , Duncan Forster , Mehran Karimzadeh , Hani Goodarzi , Fabian J. Theis , Alan Moses , Bo Wang

What Formal Languages Can Transformers Express? A Survey

As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages. Exploring such questions can help…

Machine Learning · Computer Science 2024-09-05 Lena Strobl , William Merrill , Gail Weiss , David Chiang , Dana Angluin

Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning

Pretraining on large, semantically rich datasets is key for developing language models. Surprisingly, recent studies have shown that even synthetic data, generated procedurally through simple semantic-free algorithms, can yield some of the…

Machine Learning · Computer Science 2025-05-29 Zachary Shinnick , Liangze Jiang , Hemanth Saratchandran , Anton van den Hengel , Damien Teney

Combining Transformers with Natural Language Explanations

Many NLP applications require models to be interpretable. However, many successful neural architectures, including transformers, still lack effective interpretation methods. A possible solution could rely on building explanations from…

Computation and Language · Computer Science 2024-04-04 Federico Ruggeri , Marco Lippi , Paolo Torroni

Automating the Analysis of Parsing Algorithms (and other Dynamic Programs)

Much algorithmic research in NLP aims to efficiently manipulate rich formal structures. An algorithm designer typically seeks to provide guarantees about their proposed algorithm -- for example, that its running time or space complexity is…

Programming Languages · Computer Science 2025-12-30 Tim Vieira , Ryan Cotterell , Jason Eisner