Related papers: Do Efficient Transformers Really Save Computation?

Sparse is Enough in Scaling Transformers

Large Transformer models yield impressive results on many tasks, but are expensive to train, or even fine-tune, and so slow at decoding that their use and study becomes out of reach. We address this problem by leveraging sparsity. We study…

Machine Learning · Computer Science 2021-11-29 Sebastian Jaszczur , Aakanksha Chowdhery , Afroz Mohiuddin , Łukasz Kaiser , Wojciech Gajewski , Henryk Michalewski , Jonni Kanerva

Efficient Transformers: A Survey

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example,…

Machine Learning · Computer Science 2022-03-15 Yi Tay , Mostafa Dehghani , Dara Bahri , Donald Metzler

A Practical Survey on Faster and Lighter Transformers

Recurrent neural networks are effective models to process sequences. However, they are unable to learn long-term dependencies because of their inherent sequential nature. As a solution, Vaswani et al. introduced the Transformer, a model…

Machine Learning · Computer Science 2023-03-28 Quentin Fournier , Gaétan Marceau Caron , Daniel Aloise

What Formal Languages Can Transformers Express? A Survey

As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages. Exploring such questions can help…

Machine Learning · Computer Science 2024-09-05 Lena Strobl , William Merrill , Gail Weiss , David Chiang , Dana Angluin

Efficiency 360: Efficient Vision Transformers

Transformers are widely used for solving tasks in natural language processing, computer vision, speech, and music domains. In this paper, we talk about the efficiency of transformers in terms of memory (the number of parameters),…

Computer Vision and Pattern Recognition · Computer Science 2023-02-27 Badri N. Patro , Vijay Srinivas Agneeswaran

Can pruning make Large Language Models more efficient?

Transformer models have revolutionized natural language processing with their unparalleled ability to grasp complex contextual relationships. However, the vast number of parameters in these models has raised concerns regarding computational…

Machine Learning · Computer Science 2023-10-10 Sia Gholami , Marwan Omar

A Quantitative Review on Language Model Efficiency Research

Language models (LMs) are being scaled and becoming powerful. Improving their efficiency is one of the core research topics in neural information processing systems. Tay et al. (2022) provided a comprehensive overview of efficient…

Machine Learning · Computer Science 2023-06-06 Meng Jiang , Hy Dang , Lingbo Tong

Efficient Transformers with Dynamic Token Pooling

Transformers achieve unrivalled performance in modelling language, but remain inefficient in terms of memory and time complexity. A possible remedy is to reduce the sequence length in the intermediate layers by pooling fixed-length segments…

Computation and Language · Computer Science 2023-10-25 Piotr Nawrot , Jan Chorowski , Adrian Łańcucki , Edoardo M. Ponti

Reformer: The Efficient Transformer

Large Transformer models routinely achieve state-of-the-art results on a number of tasks but training these models can be prohibitively costly, especially on long sequences. We introduce two techniques to improve the efficiency of…

Machine Learning · Computer Science 2020-02-19 Nikita Kitaev , Łukasz Kaiser , Anselm Levskaya

Transformer Acceleration with Dynamic Sparse Attention

Transformers are the mainstream of NLP applications and are becoming increasingly popular in other domains such as Computer Vision. Despite the improvements in model quality, the enormous computation costs make Transformers difficult at…

Machine Learning · Computer Science 2021-10-22 Liu Liu , Zheng Qu , Zhaodong Chen , Yufei Ding , Yuan Xie

Transformer-based Models for Long-Form Document Matching: Challenges and Empirical Analysis

Recent advances in the area of long document matching have primarily focused on using transformer-based models for long document encoding and matching. There are two primary challenges associated with these models. Firstly, the performance…

Computation and Language · Computer Science 2023-02-09 Akshita Jha , Adithya Samavedhi , Vineeth Rakesh , Jaideep Chandrashekar , Chandan K. Reddy

Looped Transformers are Better at Learning Learning Algorithms

Transformers have demonstrated effectiveness in in-context solving data-fitting problems from various (latent) models, as reported by Garg et al. However, the absence of an inherent iterative structure in the transformer architecture…

Machine Learning · Computer Science 2024-03-19 Liu Yang , Kangwook Lee , Robert Nowak , Dimitris Papailiopoulos

Sparse or Dense? A Mechanistic Estimation of Computation Density in Transformer-based LLMs

Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs. Several studies on LLM efficiency optimization argue that it is possible to prune a significant portion…

Computation and Language · Computer Science 2026-04-16 Corentin Kervadec , Iuliia Lysova , Marco Baroni , Gemma Boleda

Are Transformers More Robust? Towards Exact Robustness Verification for Transformers

As an emerging type of Neural Networks (NNs), Transformers are used in many domains ranging from Natural Language Processing to Autonomous Driving. In this paper, we study the robustness problem of Transformers, a key characteristic as low…

Machine Learning · Computer Science 2024-12-03 Brian Hsuan-Cheng Liao , Chih-Hong Cheng , Hasan Esen , Alois Knoll

Two Steps Forward and One Behind: Rethinking Time Series Forecasting with Deep Learning

The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks, first in natural language processing and later in computer vision. This model is based on the attention mechanism…

Machine Learning · Computer Science 2023-05-09 Riccardo Ughi , Eugenio Lomurno , Matteo Matteucci

Exploring the Performance and Efficiency of Transformer Models for NLP on Mobile Devices

Deep learning (DL) is characterised by its dynamic nature, with new deep neural network (DNN) architectures and approaches emerging every few years, driving the field's advancement. At the same time, the ever-increasing use of mobile…

Machine Learning · Computer Science 2023-07-25 Ioannis Panopoulos , Sokratis Nikolaidis , Stylianos I. Venieris , Iakovos S. Venieris

Transformers in Vision: A Survey

Astounding results from Transformer models on natural language tasks have intrigued the vision community to study their application to computer vision problems. Among their salient benefits, Transformers enable modeling long dependencies…

Computer Vision and Pattern Recognition · Computer Science 2022-01-20 Salman Khan , Muzammal Naseer , Munawar Hayat , Syed Waqas Zamir , Fahad Shahbaz Khan , Mubarak Shah

Leaner Transformers: More Heads, Less Depth

Transformers have reshaped machine learning by utilizing attention mechanisms to capture complex patterns in large datasets, leading to significant improvements in performance. This success has contributed to the belief that "bigger means…

Machine Learning · Computer Science 2025-05-28 Hemanth Saratchandran , Damien Teney , Simon Lucey

A Survey on Efficient Training of Transformers

Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by…

Machine Learning · Computer Science 2023-05-05 Bohan Zhuang , Jing Liu , Zizheng Pan , Haoyu He , Yuetian Weng , Chunhua Shen

Let's Think Dot by Dot: Hidden Computation in Transformer Language Models

Chain-of-thought responses from language models improve performance across most benchmarks. However, it remains unclear to what extent these performance gains can be attributed to human-like task decomposition or simply the greater…

Computation and Language · Computer Science 2024-04-25 Jacob Pfau , William Merrill , Samuel R. Bowman