Related papers: Softmax-free Linear Transformers

SOFT: Softmax-free Transformer with Linear Complexity

Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention. However, the employment of self-attention modules results in a quadratic…

Computer Vision and Pattern Recognition · Computer Science 2022-05-03 Jiachen Lu , Jinghan Yao , Junge Zhang , Xiatian Zhu , Hang Xu , Weiguo Gao , Chunjing Xu , Tao Xiang , Li Zhang

Soft Error Reliability Analysis of Vision Transformers

Vision Transformers (ViTs) that leverage self-attention mechanism have shown superior performance on many classical vision tasks compared to convolutional neural networks (CNNs) and gain increasing popularity recently. Existing ViTs works…

Cryptography and Security · Computer Science 2024-04-29 Xinghua Xue , Cheng Liu , Ying Wang , Bing Yang , Tao Luo , Lei Zhang , Huawei Li , Xiaowei Li

X-ViT: High Performance Linear Vision Transformer without Softmax

Vision transformers have become one of the most important models for computer vision tasks. Although they outperform prior works, they require heavy computational resources on a scale that is quadratic to the number of tokens, $N$. This is…

Computer Vision and Pattern Recognition · Computer Science 2022-05-30 Jeonggeun Song , Heung-Chang Lee

The Linear Attention Resurrection in Vision Transformer

Vision Transformers (ViTs) have recently taken computer vision by storm. However, the softmax attention underlying ViTs comes with a quadratic complexity in time and memory, hindering the application of ViTs to high-resolution images. We…

Computer Vision and Pattern Recognition · Computer Science 2025-02-17 Chuanyang Zheng

Compute-Efficient Medical Image Classification with Softmax-Free Transformers and Sequence Normalization

The Transformer model has been pivotal in advancing fields such as natural language processing, speech recognition, and computer vision. However, a critical limitation of this model is its quadratic computational and memory complexity…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Firas Khader , Omar S. M. El Nahhas , Tianyu Han , Gustav Müller-Franzes , Sven Nebelung , Jakob Nikolas Kather , Daniel Truhn

Self-Attention And Beyond the Infinite: Towards Linear Transformers with Infinite Self-Attention

The quadratic cost of softmax attention limits Transformer scalability in high-resolution vision. We introduce Infinite Self-Attention (InfSA), a spectral reformulation that treats each attention layer as a diffusion step on a…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Giorgio Roffo , Hazem Abdelkawy , Nilli Lavie , Luke Palmer

Vicinity Vision Transformer

Vision transformers have shown great success on numerous computer vision tasks. However, its central component, softmax attention, prohibits vision transformers from scaling up to high-resolution images, due to both the computational…

Computer Vision and Pattern Recognition · Computer Science 2023-07-21 Weixuan Sun , Zhen Qin , Hui Deng , Jianyuan Wang , Yi Zhang , Kaihao Zhang , Nick Barnes , Stan Birchfield , Lingpeng Kong , Yiran Zhong

FLatten Transformer: Vision Transformer using Focused Linear Attention

The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks. Linear attention, on the other hand, offers a much more efficient alternative with its linear…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Dongchen Han , Xuran Pan , Yizeng Han , Shiji Song , Gao Huang

Linear Video Transformer with Feature Fixation

Vision Transformers have achieved impressive performance in video classification, while suffering from the quadratic complexity caused by the Softmax attention mechanism. Some studies alleviate the computational costs by reducing the number…

Computer Vision and Pattern Recognition · Computer Science 2022-10-18 Kaiyue Lu , Zexiang Liu , Jianyuan Wang , Weixuan Sun , Zhen Qin , Dong Li , Xuyang Shen , Hui Deng , Xiaodong Han , Yuchao Dai , Yiran Zhong

SeTformer is What You Need for Vision and Language

The dot product self-attention (DPSA) is a fundamental component of transformers. However, scaling them to long sequences, like documents or high-resolution images, becomes prohibitively expensive due to quadratic time and memory…

Computer Vision and Pattern Recognition · Computer Science 2024-01-09 Pourya Shamsolmoali , Masoumeh Zareapoor , Eric Granger , Michael Felsberg

Linearizing Vision Transformer with Test-Time Training

While linear-complexity attention mechanisms offer a promising alternative to Softmax attention for overcoming the quadratic bottleneck, training such models from scratch remains prohibitively expensive. Inheriting weights from pretrained…

Computer Vision and Pattern Recognition · Computer Science 2026-05-29 Yining Li , Dongchen Han , Zeyu Liu , Hanyi Wang , Yulin Wang , Gao Huang

ViTALiTy: Unifying Low-rank and Sparse Approximation for Vision Transformer Acceleration with a Linear Taylor Attention

Vision Transformer (ViT) has emerged as a competitive alternative to convolutional neural networks for various computer vision applications. Specifically, ViT multi-head attention layers make it possible to embed information globally across…

Computer Vision and Pattern Recognition · Computer Science 2022-11-10 Jyotikrishna Dass , Shang Wu , Huihong Shi , Chaojian Li , Zhifan Ye , Zhongfeng Wang , Yingyan Lin

Vision Xformers: Efficient Attention for Image Classification

Although transformers have become the neural architectures of choice for natural language processing, they require orders of magnitude more training data, GPU memory, and computations in order to compete with convolutional neural networks…

Computer Vision and Pattern Recognition · Computer Science 2021-10-04 Pranav Jeevan , Amit Sethi

Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer

Motivated by the huge success of Transformers in the field of natural language processing (NLP), Vision Transformers (ViTs) have been rapidly developed and achieved remarkable performance in various computer vision tasks. However, their…

Computer Vision and Pattern Recognition · Computer Science 2024-10-01 Huihong Shi , Haikuo Shao , Wendong Mao , Zhongfeng Wang

MB-TaylorFormer V2: Improved Multi-branch Linear Transformer Expanded by Taylor Formula for Image Restoration

Recently, Transformer networks have demonstrated outstanding performance in the field of image restoration due to the global receptive field and adaptability to input. However, the quadratic computational complexity of Softmax-attention…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Zhi Jin , Yuwei Qiu , Kaihao Zhang , Hongdong Li , Wenhan Luo

Understanding The Robustness in Vision Transformers

Recent studies show that Vision Transformers(ViTs) exhibit strong robustness against various corruptions. Although this property is partly attributed to the self-attention mechanism, there is still a lack of systematic understanding. In…

Computer Vision and Pattern Recognition · Computer Science 2022-11-09 Daquan Zhou , Zhiding Yu , Enze Xie , Chaowei Xiao , Anima Anandkumar , Jiashi Feng , Jose M. Alvarez

Bridging the Divide: Reconsidering Softmax and Linear Attention

Widely adopted in modern Vision Transformer designs, Softmax attention can effectively capture long-range visual information; however, it incurs excessive computational cost when dealing with high-resolution inputs. In contrast, linear…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Dongchen Han , Yifan Pu , Zhuofan Xia , Yizeng Han , Xuran Pan , Xiu Li , Jiwen Lu , Shiji Song , Gao Huang

Sparse and Structured Visual Attention

Visual attention mechanisms are widely used in multimodal tasks, as visual question answering (VQA). One drawback of softmax-based attention mechanisms is that they assign some probability mass to all image regions, regardless of their…

Computation and Language · Computer Science 2021-07-09 Pedro Henrique Martins , Vlad Niculae , Zita Marinho , André Martins

UFO-ViT: High Performance Linear Vision Transformer without Softmax

Vision transformers have become one of the most important models for computer vision tasks. Although they outperform prior works, they require heavy computational resources on a scale that is quadratic to $N$. This is a major drawback of…

Computer Vision and Pattern Recognition · Computer Science 2021-11-05 Jeong-geun Song

Cottention: Linear Transformers With Cosine Attention

Attention mechanisms, particularly softmax attention, have been instrumental in the success of transformer-based models such as GPT. However, the quadratic memory complexity of softmax attention with respect to sequence length poses…

Machine Learning · Computer Science 2026-02-20 Gabriel Mongaras , Trevor Dohm , Eric C. Larson