English
Related papers

Related papers: Mesa: A Memory-saving Training Framework for Trans…

200 papers

Training convolutional neural network models is memory intensive since back-propagation requires storing activations of all intermediate layers. This presents a practical concern when seeking to deploy very deep architectures in production,…

Machine Learning · Computer Science 2019-10-30 Ayan Chakrabarti , Benjamin Moseley

Fine-tuning provides an effective means to specialize pre-trained models for various downstream tasks. However, fine-tuning often incurs high memory overhead, especially for large transformer-based models, such as LLMs. While existing…

Computation and Language · Computer Science 2025-02-03 Antoine Simoulin , Namyong Park , Xiaoyi Liu , Grey Yang

At present, the mechanisms of in-context learning in Transformers are not well understood and remain mostly an intuition. In this paper, we suggest that training Transformers on auto-regressive objectives is closely related to…

In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers. Since the…

Machine Learning · Computer Science 2022-02-04 Daniel Bershatsky , Aleksandr Mikhalev , Alexandr Katrutsa , Julia Gusak , Daniil Merkulov , Ivan Oseledets

Some autoregressive models exhibit in-context learning capabilities: being able to learn as an input sequence is processed, without undergoing any parameter changes, and without being explicitly trained to do so. The origins of this…

Autoregressively trained transformers have brought a profound revolution to the world, especially with their in-context learning (ICL) ability to address downstream tasks. Recently, several studies suggest that transformers learn a…

Machine Learning · Computer Science 2024-10-29 Chenyu Zheng , Wei Huang , Rongzhen Wang , Guoqiang Wu , Jun Zhu , Chongxuan Li

The impact of transformer networks is booming, yet, they come with significant computational complexity. It is therefore essential to understand how to optimally map and execute these networks on modern neural processor hardware. So far,…

Hardware Architecture · Computer Science 2024-06-17 Steven Colleman , Arne Symons , Victor J. B. Jung , Marian Verhelst

Recent advancements in on-device training for deep neural networks have underscored the critical need for efficient activation compression to overcome the memory constraints of mobile and edge devices. As activations dominate memory usage…

Networking and Internet Architecture · Computer Science 2025-07-11 Renyuan Liu , Yuyang Leng , Kaiyan Liu , Shaohan Hu , Chun-Fu , Chen , Peijun Zhao , Heechul Yun , Shuochao Yao

In pursuit of faster computation, Efficient Transformers demonstrate an impressive variety of approaches -- models attaining sub-quadratic attention complexity can utilize a notion of sparsity or a low-rank approximation of inputs to reduce…

Machine Learning · Computer Science 2022-11-09 Uladzislau Yorsh , Alexander Kovalenko

Recent advances in Transformers have come with a huge requirement on computing resources, highlighting the importance of developing efficient training techniques to make Transformer training faster, at lower cost, and to higher accuracy by…

Machine Learning · Computer Science 2023-05-05 Bohan Zhuang , Jing Liu , Zizheng Pan , Haoyu He , Yuetian Weng , Chunhua Shen

Training large transformer models is one of the most important computational challenges of modern AI. In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation.…

Machine Learning · Computer Science 2022-05-12 Vijay Korthikanti , Jared Casper , Sangkug Lym , Lawrence McAfee , Michael Andersch , Mohammad Shoeybi , Bryan Catanzaro

Transformer-based deep learning models are increasingly deployed on energy, and DRAM bandwidth constrained devices such as laptops and gaming consoles, which presents significant challenges in meeting the latency requirements of the models.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Aadesh Deshmukh , Venkata Yaswanth Raparti , Samuel Hsu

Continuously adapting pre-trained models to local data on resource constrained edge devices is the $\emph{last mile}$ for model deployment. However, as models increase in size and depth, backpropagation requires a large amount of memory,…

Machine Learning · Computer Science 2024-11-07 Chen Feng , Shaojie Zhuo , Xiaopeng Zhang , Ramchalam Kinattinkara Ramakrishnan , Zhaocong Yuan , Andrew Zou Li

Transformers have achieved remarkable successes across a wide range of applications, yet the theoretical foundation of their model efficiency remains underexplored. In this work, we investigate how the model parameters -- mainly attention…

Machine Learning · Computer Science 2025-10-07 Ruoxi Yu , Haotian Jiang , Jingpu Cheng , Penghao Yu , Qianxiao Li , Zhong Li

Transformer models gain popularity because of their superior inference accuracy and inference throughput. However, the transformer is computation-intensive, causing a long inference time. The existing works on transformer inference…

Performance · Computer Science 2023-04-19 Yuan Feng , Hyeran Jeon , Filip Blagojevic , Cyril Guyot , Qing Li , Dong Li

With the increasing complexity of generative AI models, post-training quantization (PTQ) has emerged as a promising solution for deploying hyper-scale models on edge devices such as mobile and TVs. Existing PTQ schemes, however, consume…

Machine Learning · Computer Science 2024-11-06 Junhan Kim , Chungman Lee , Eulrang Cho , Kyungphil Park , Ho-young Kim , Joonyoung Kim , Yongkweon Jeon

The substantial memory demands of pre-training and fine-tuning large language models (LLMs) require memory-efficient optimization algorithms. One promising approach is layer-wise optimization, which treats each transformer block as a single…

Machine Learning · Computer Science 2026-01-15 Yuxi Liu , Renjia Deng , Yutong He , Xue Wang , Tao Yao , Kun Yuan

Transformer-based models have become the \textit{de facto} backbone across many fields, such as computer vision and natural language processing. However, as these models scale in size, external memory access (EMA) for weight and activations…

Machine Learning · Computer Science 2025-03-26 Tseng-Jen Li , Tian-Sheuan Chang

The increasing scale of Transformer models has led to an increase in their pre-training computational requirements. While quantization has proven to be effective after pre-training and during fine-tuning, applying quantization in…

Machine Learning · Computer Science 2024-10-14 Kamran Chitsaz , Quentin Fournier , Gonçalo Mordido , Sarath Chandar

The scaling of neural networks with increasing data and model sizes necessitates the development of more efficient deep learning algorithms. A significant challenge in neural network training is the memory footprint associated with…

Machine Learning · Computer Science 2024-10-08 Georgii Novikov , Ivan Oseledets
‹ Prev 1 2 3 10 Next ›