Related papers: Linear Attention for Efficient Bidirectional Seque…

On The Application of Linear Attention in Multimodal Transformers

Multimodal Transformers serve as the backbone for state-of-the-art vision-language models, yet their quadratic attention complexity remains a critical barrier to scalability. In this work, we investigate the viability of Linear Attention…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Armin Gerami , Seyedehanita Madani , Ramani Duraiswami

LAWCAT: Efficient Distillation from Quadratic to Linear Attention with Convolution across Tokens for Long Context Modeling

Although transformer architectures have achieved state-of-the-art performance across diverse domains, their quadratic computational complexity with respect to sequence length remains a significant bottleneck, particularly for…

Computation and Language · Computer Science 2025-11-05 Zeyu Liu , Souvik Kundu , Lianghao Jiang , Anni Li , Srikanth Ronanki , Sravan Bodapati , Gourav Datta , Peter A. Beerel

Log-Linear Attention

The attention mechanism in Transformers is an important primitive for accurate and scalable sequence modeling. Its quadratic-compute and linear-memory complexity however remain significant bottlenecks. Linear attention and state-space…

Machine Learning · Computer Science 2026-03-03 Han Guo , Songlin Yang , Tarushii Goel , Eric P. Xing , Tri Dao , Yoon Kim

UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs

Although transformers have demonstrated remarkable capabilities across various domains, their quadratic attention mechanisms introduce significant computational overhead when processing long-sequence data. In this paper, we present a…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Zhe Liu , Jinghua Hou , Xiaoqing Ye , Jingdong Wang , Hengshuang Zhao , Xiang Bai

FLatten Transformer: Vision Transformer using Focused Linear Attention

The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks. Linear attention, on the other hand, offers a much more efficient alternative with its linear…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Dongchen Han , Xuran Pan , Yizeng Han , Shiji Song , Gao Huang

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Various linear complexity models, such as Linear Transformer (LinFormer), State Space Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional softmax attention in Transformer structures. However, the optimal…

Machine Learning · Computer Science 2024-11-19 Yuhong Chou , Man Yao , Kexin Wang , Yuqi Pan , Ruijie Zhu , Yiran Zhong , Yu Qiao , Jibin Wu , Bo Xu , Guoqi Li

Bridging the Divide: Reconsidering Softmax and Linear Attention

Widely adopted in modern Vision Transformer designs, Softmax attention can effectively capture long-range visual information; however, it incurs excessive computational cost when dealing with high-resolution inputs. In contrast, linear…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Dongchen Han , Yifan Pu , Zhuofan Xia , Yizeng Han , Xuran Pan , Xiu Li , Jiwen Lu , Shiji Song , Gao Huang

LION: Implicit Vision Prompt Tuning

Despite recent competitive performance across a range of vision tasks, vision Transformers still have an issue of heavy computational costs. Recently, vision prompt learning has provided an economic solution to this problem without…

Computer Vision and Pattern Recognition · Computer Science 2024-03-28 Haixin Wang , Jianlong Chang , Xiao Luo , Jinan Sun , Zhouchen Lin , Qi Tian

Luna: Linear Unified Nested Attention

The quadratic computational and memory complexities of the Transformer's attention mechanism have limited its scalability for modeling long sequences. In this paper, we propose Luna, a linear unified nested attention mechanism that…

Machine Learning · Computer Science 2021-11-04 Xuezhe Ma , Xiang Kong , Sinong Wang , Chunting Zhou , Jonathan May , Hao Ma , Luke Zettlemoyer

LT2: Linear-Time Looped Transformers

Looped Transformers (LT) have emerged as a powerful architecture by iterating their layers multiple times before decoding the final token. However, pairing them with full attention retains quadratic complexity, making them computationally…

Machine Learning · Computer Science 2026-05-26 Chunyuan Deng , Yizhe Zhang , Rui-Jie Zhu , Yuanyuan Xu , Jiarui Liu , T. S. Eugene Ng , Hanjie Chen

A Practical Survey on Faster and Lighter Transformers

Recurrent neural networks are effective models to process sequences. However, they are unable to learn long-term dependencies because of their inherent sequential nature. As a solution, Vaswani et al. introduced the Transformer, a model…

Machine Learning · Computer Science 2023-03-28 Quentin Fournier , Gaétan Marceau Caron , Daniel Aloise

Bidirectional Linear Recurrent Models for Sequence-Level Multisource Fusion

Sequence modeling is a critical yet challenging task with wide-ranging applications, especially in time series forecasting for domains like weather prediction, temperature monitoring, and energy load forecasting. Transformers, with their…

Machine Learning · Computer Science 2025-04-15 Qisai Liu , Zhanhong Jiang , Joshua R. Waite , Chao Liu , Aditya Balu , Soumik Sarkar

You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet

Linear attention mechanisms have gained prominence in causal language models due to their linear computational complexity and enhanced speed. However, the inherent decay mechanism in linear attention presents challenges when applied to…

Computation and Language · Computer Science 2024-06-03 Zhen Qin , Yuxin Mao , Xuyang Shen , Dong Li , Jing Zhang , Yuchao Dai , Yiran Zhong

LION: Linear Group RNN for 3D Object Detection in Point Clouds

The benefit of transformers in large-scale 3D point cloud perception tasks, such as 3D object detection, is limited by their quadratic computation cost when modeling long-range relationships. In contrast, linear RNNs have low computational…

Computer Vision and Pattern Recognition · Computer Science 2024-07-26 Zhe Liu , Jinghua Hou , Xinyu Wang , Xiaoqing Ye , Jingdong Wang , Hengshuang Zhao , Xiang Bai

A Cheap Linear Attention Mechanism with Fast Lookups and Fixed-Size Representations

The softmax content-based attention mechanism has proven to be very beneficial in many applications of recurrent neural networks. Nevertheless it suffers from two major computational limitations. First, its computations for an attention…

Machine Learning · Computer Science 2016-09-20 Alexandre de Brébisson , Pascal Vincent

MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration

Recently, Transformer networks have demonstrated outstanding performance in the field of image restoration due to the global receptive field and adaptability to input. However, the quadratic computational complexity of Softmax-attention…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Zhi Jin , Yuwei Qiu , Kaihao Zhang , Hongdong Li , Wenhan Luo

LION: Latent Point Diffusion Models for 3D Shape Generation

Denoising diffusion models (DDMs) have shown promising results in 3D point cloud synthesis. To advance 3D DDMs and make them useful for digital artists, we require (i) high generation quality, (ii) flexibility for manipulation and…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Xiaohui Zeng , Arash Vahdat , Francis Williams , Zan Gojcic , Or Litany , Sanja Fidler , Karsten Kreis

Exact Linear Attention

This paper introduces Exact Linear Attention (ELA), a mechanism that achieves linear computational complexity for Transformer attention by exploiting the exact decomposition property of kernel functions, thereby eliminating approximation…

Machine Learning · Computer Science 2026-05-21 Weinuo Ou

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Linear attention is an efficient attention mechanism that has recently emerged as a promising alternative to conventional softmax attention. With its ability to process tokens in linear computational complexities, linear attention, in…

Computation and Language · Computer Science 2024-01-17 Zhen Qin , Weigao Sun , Dong Li , Xuyang Shen , Weixuan Sun , Yiran Zhong

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

Diffusion Transformers (DiT) have become a leading architecture in image generation. However, the quadratic complexity of attention mechanisms, which are responsible for modeling token-wise relationships, results in significant latency when…

Computer Vision and Pattern Recognition · Computer Science 2024-12-23 Songhua Liu , Zhenxiong Tan , Xinchao Wang