English
Related papers

Related papers: Linearizing Large Language Models

200 papers

Transformers with linear recurrent modeling offer linear-time training and constant-memory inference. Despite their demonstrated efficiency and performance, pretraining such non-standard architectures from scratch remains costly and risky.…

Computation and Language · Computer Science 2025-05-08 Disen Lan , Weigao Sun , Jiaxi Hu , Jusen Du , Yu Cheng

Modern recurrent layers are emerging as a promising path toward edge deployment of foundation models, especially in the context of large language models (LLMs). Compressing the whole input sequence in a finite-dimensional representation…

Machine Learning · Computer Science 2024-07-18 Alessandro Pierro , Steven Abreu

We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into subquadratic architectures. Transformers faces severe computational and memory bottlenecks with long sequences due…

Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants. However, their growing capabilities come at the cost of extremely large model…

Machine Learning · Computer Science 2025-02-27 Yingyu Liang , Jiangxuan Long , Zhenmei Shi , Zhao Song , Yufa Zhou

Transformer-based large language models (LLMs) excel in modeling complex language patterns but face significant computational costs during inference, especially with long inputs due to the attention mechanism's memory overhead. We observe…

Computation and Language · Computer Science 2024-10-18 Ruiqing Yan , Linghan Zheng , Xingbo Du , Han Zou , Yufeng Guo , Jianfei Yang

Transformers are the current architecture of choice for NLP, but their attention layers do not scale well to long contexts. Recent works propose to replace attention with linear recurrent layers -- this is the case for state space models,…

Computation and Language · Computer Science 2024-07-09 Hugo Pitorro , Pavlo Vasylenko , Marcos Treviso , André F. T. Martins

Transformers with linear attention (i.e., linear transformers) and state-space models have recently been suggested as a viable linear-time alternative to transformers with softmax attention. However, these models still underperform…

Machine Learning · Computer Science 2025-01-16 Songlin Yang , Bailin Wang , Yu Zhang , Yikang Shen , Yoon Kim

Transformer-based embedding models suffer from quadratic computational and linear memory complexity, limiting their utility for long sequences. We propose recurrent architectures as an efficient alternative, introducing a vertically chunked…

Computation and Language · Computer Science 2026-04-21 Tobias Grantner , Emanuel Sallinger , Martin Flechl

Sequence modeling is currently dominated by causal transformer architectures that use softmax self-attention. Although widely adopted, transformers require scaling memory and compute linearly during inference. A recent stream of work…

Large Language Models (LLMs) demonstrate exceptional reasoning abilities, enabling strong generalization across diverse tasks such as commonsense reasoning and instruction following. However, as LLMs scale, inference costs become…

Computation and Language · Computer Science 2025-02-06 Rhea Sanjay Sukthanker , Benedikt Staffler , Frank Hutter , Aaron Klein

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually pre-train these models, saving significant…

Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit…

Although large language models (LLMs) have achieved significant success in natural language processing, they still struggle with long-context comprehension. Traditional approaches to mitigating this issue typically rely on fine-tuning or…

Computation and Language · Computer Science 2025-02-25 Yifei Gao , Shaohong Chen , Lei Wang , Ruiting Dai , Ziyun Zhang , Kerui Ren , Jiaji Wu , Jun Cheng

Transformers have outperformed recurrent neural networks (RNNs) in natural language generation. But this comes with a significant computational cost, as the attention mechanism's complexity scales quadratically with sequence length.…

Computation and Language · Computer Science 2021-09-21 Jungo Kasai , Hao Peng , Yizhe Zhang , Dani Yogatama , Gabriel Ilharco , Nikolaos Pappas , Yi Mao , Weizhu Chen , Noah A. Smith

Linearization has emerged as a strategy for developing efficient language models (LMs). Starting from an existing Transformer-based LM, linearization replaces the attention component with computationally efficient subquadratic \textit{token…

Computation and Language · Computer Science 2026-02-02 Patrick Haller , Jonas Golde , Alan Akbik

In recent studies, linear recurrent neural networks (LRNNs) have achieved Transformer-level performance in natural language and long-range modeling, while offering rapid parallel training and constant inference cost. With the resurgence of…

Computation and Language · Computer Science 2024-04-10 Ting-Han Fan , Ta-Chung Chi , Alexander I. Rudnicky

While linear-complexity attention mechanisms offer a promising alternative to Softmax attention for overcoming the quadratic bottleneck, training such models from scratch remains prohibitively expensive. Inheriting weights from pretrained…

Computer Vision and Pattern Recognition · Computer Science 2026-05-29 Yining Li , Dongchen Han , Zeyu Liu , Hanyi Wang , Yulin Wang , Gao Huang

Large Language Models (LLMs) are known for their expensive and time-consuming training. Thus, oftentimes, LLMs are fine-tuned to address a specific task, given the pretrained weights of a pre-trained LLM considered a foundation model. In…

Computation and Language · Computer Science 2025-12-05 Eshed Gal , Moshe Eliasof , Javier Turek , Uri Ascher , Eran Treister , Eldad Haber

Despite the advantageous subquadratic complexity of modern recurrent deep learning models -- such as state-space models (SSMs) -- recent studies have highlighted their potential shortcomings compared to transformers on reasoning and…

Machine Learning · Computer Science 2025-10-13 Destiny Okpekpe , Antonio Orvieto

Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention -- avoiding the expensive pretraining costs. However,…

Machine Learning · Computer Science 2025-03-07 Michael Zhang , Simran Arora , Rahul Chalamala , Alan Wu , Benjamin Spector , Aaryan Singhal , Krithik Ramesh , Christopher Ré
‹ Prev 1 2 3 10 Next ›