English
Related papers

Related papers: Flowformer: Linearizing Transformers with Conserva…

200 papers

In the Transformer model, "self-attention" combines information from attended embeddings into the representation of the focal embedding in the next layer. Thus, across layers of the Transformer, information originating from different tokens…

Machine Learning · Computer Science 2020-06-02 Samira Abnar , Willem Zuidema

Transformer models have achieved remarkable results in a wide range of applications. However, their scalability is hampered by the quadratic time and memory complexity of the self-attention mechanism concerning the sequence length. This…

Machine Learning · Computer Science 2024-02-27 Yury Nahshan , Joseph Kampeas , Emir Haleva

The Transformer architecture has become a cornerstone of modern artificial intelligence, but its core self-attention mechanism suffers from a complexity bottleneck that scales quadratically with sequence length, severely limiting its…

Machine Learning · Computer Science 2025-08-29 Zhongpan Tang

Transformer-based deep learning models have achieved state-of-the-art performance across numerous language and vision tasks. While the self-attention mechanism, a core component of transformers, has proven capable of handling complex data…

Machine Learning · Computer Science 2025-08-05 Laziz Abdullaev , Tan M. Nguyen

The quadratic computation complexity of self-attention has been a persistent challenge when applying Transformer models to vision tasks. Linear attention, on the other hand, offers a much more efficient alternative with its linear…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Dongchen Han , Xuran Pan , Yizeng Han , Shiji Song , Gao Huang

Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, training and deploying these models can be prohibitively costly for long sequences,…

Machine Learning · Computer Science 2020-06-16 Sinong Wang , Belinda Z. Li , Madian Khabsa , Han Fang , Hao Ma

The self-attention mechanism has been a key factor in the advancement of vision Transformers. However, its quadratic complexity imposes a heavy computational burden in high-resolution scenarios, restricting the practical application.…

Computer Vision and Pattern Recognition · Computer Science 2025-12-29 Dongchen Han , Tianyu Li , Ziyi Wang , Gao Huang

Transformers are built upon multi-head scaled dot-product attention and positional encoding, which aim to learn the feature representations and token dependencies. In this work, we focus on enhancing the distinctive representation by…

Computer Vision and Pattern Recognition · Computer Science 2022-07-12 Litao Yu , Jian Zhang

Transformer is a ubiquitous model for natural language processing and has attracted wide attentions in computer vision. The attention maps are indispensable for a transformer model to encode the dependencies among input tokens. However,…

Machine Learning · Computer Science 2021-02-26 Yujing Wang , Yaming Yang , Jiangang Bai , Mingliang Zhang , Jing Bai , Jing Yu , Ce Zhang , Gao Huang , Yunhai Tong

Transformers have achieved remarkable success in sequence modeling and beyond but suffer from quadratic computational and memory complexities with respect to the length of the input sequence. Leveraging techniques include sparse and linear…

Machine Learning · Computer Science 2022-08-02 Tan Nguyen , Richard G. Baraniuk , Robert M. Kirby , Stanley J. Osher , Bao Wang

Transformers have revolutionized deep learning in numerous fields, including natural language processing, computer vision, and audio processing. Their strength lies in their attention mechanism, which allows for the discovering of complex…

Machine Learning · Computer Science 2024-04-02 Uladzislau Yorsh , Martin Holeňa , Ondřej Bojar , David Herel

Transformers and their attention mechanism have been revolutionary in the field of Machine Learning. While originally proposed for the language data, they quickly found their way to the image, video, graph, etc. data modalities with various…

Machine Learning · Computer Science 2025-09-22 Saeed Amizadeh , Sara Abdali , Yinheng Li , Kazuhito Koishida

Motivated by the factorization inherent in the original fast multipole method and the improved fast Gauss transform we introduce a factorable form of attention that operates efficiently in high dimensions. This approach reduces the…

Machine Learning · Computer Science 2024-02-13 Armin Gerami , Monte Hoover , Pranav S. Dulepet , Ramani Duraiswami

The Transformer is a sequence model that forgoes traditional recurrent architectures in favor of a fully attention-based approach. Besides improving performance, an advantage of using attention is that it can also help to interpret a model…

Human-Computer Interaction · Computer Science 2019-06-14 Jesse Vig

Flow-based generative models have shown an excellent ability to explicitly learn the probability density function of data via a sequence of invertible transformations. Yet, learning attentions in generative flows remains understudied, while…

Machine Learning · Computer Science 2022-04-01 Rhea Sanjay Sukthanker , Zhiwu Huang , Suryansh Kumar , Radu Timofte , Luc Van Gool

Transformers have proven highly effective across modalities, but standard softmax attention scales quadratically with sequence length, limiting long context modeling. Linear attention mitigates this by approximating attention with kernel…

Machine Learning · Computer Science 2026-02-10 Ashkan Shahbazi , Chayne Thrash , Yikun Bai , Keaton Hamm , Navid NaderiAlizadeh , Soheil Kolouri

Transformers are increasingly adopted for modeling and forecasting time-series, yet their internal mechanisms remain poorly understood from a dynamical systems perspective. In contrast to classical autoregressive and state-space models,…

Machine Learning · Computer Science 2025-12-25 Gregory Duthé , Nikolaos Evangelou , Wei Liu , Ioannis G. Kevrekidis , Eleni Chatzi

Linear attention mechanisms have emerged as efficient alternatives to full self-attention in Graph Transformers, offering linear time complexity. However, existing linear attention models often suffer from a significant drop in…

Computer Vision and Pattern Recognition · Computer Science 2026-01-29 Zhaolin Hu , Kun Li , Hehe Fan , Yi Yang

Self-attention in Transformers relies on globally normalized softmax weights, causing all tokens to compete for influence at every layer. When composed across depth, this interaction pattern induces strong synchronization dynamics that…

Machine Learning · Computer Science 2026-05-26 Jingkun Liu , Yisong Yue , Max Welling , Yue Song

The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks, first in natural language processing and later in computer vision. This model is based on the attention mechanism…

Machine Learning · Computer Science 2023-05-09 Riccardo Ughi , Eugenio Lomurno , Matteo Matteucci
‹ Prev 1 2 3 10 Next ›