English
Related papers

Related papers: Efficient Attention using a Fixed-Size Memory Repr…

200 papers

The softmax content-based attention mechanism has proven to be very beneficial in many applications of recurrent neural networks. Nevertheless it suffers from two major computational limitations. First, its computations for an attention…

Machine Learning · Computer Science 2016-09-20 Alexandre de Brébisson , Pascal Vincent

Encoder-decoder models have become an effective approach for sequence learning tasks like machine translation, image captioning and speech recognition, but have yet to show competitive results for handwritten text recognition. To this end,…

Computer Vision and Pattern Recognition · Computer Science 2019-07-16 Johannes Michael , Roger Labahn , Tobias Grüning , Jochen Zöllner

Transformer-based architectures have become the prevailing backbone of large language models. However, the quadratic time and memory complexity of self-attention remains a fundamental obstacle to efficient long-context modeling. To address…

Computation and Language · Computer Science 2026-02-10 Yutao Sun , Zhenyu Li , Yike Zhang , Tengyu Pan , Bowen Dong , Yuyi Guo , Jianyong Wang

Dot-product attention has wide applications in computer vision and natural language processing. However, its memory and computational costs grow quadratically with the input size. Such growth prohibits its application on high-resolution…

Computer Vision and Pattern Recognition · Computer Science 2024-01-22 Zhuoran Shen , Mingyuan Zhang , Haiyu Zhao , Shuai Yi , Hongsheng Li

Transformer architectures have achieved state-of-the-art results on a variety of sequence modeling tasks. However, their attention mechanism comes with a quadratic complexity in sequence lengths, making the computational overhead…

Computation and Language · Computer Science 2022-06-03 Hao Peng , Jungo Kasai , Nikolaos Pappas , Dani Yogatama , Zhaofeng Wu , Lingpeng Kong , Roy Schwartz , Noah A. Smith

Attention-based Neural Machine Translation (NMT) models suffer from attention deficiency issues as has been observed in recent research. We propose a novel mechanism to address some of these limitations and improve the NMT attention.…

Computation and Language · Computer Science 2016-08-10 Baskaran Sankaran , Haitao Mi , Yaser Al-Onaizan , Abe Ittycheriah

Many machine learning models use the manipulation of dimensions as a driving force to enable models to identify and learn important features in data. In the case of sequential data this manipulation usually happens on the token dimension…

Machine Learning · Computer Science 2023-10-24 Daniel Biermann , Fabrizio Palumbo , Morten Goodwin , Ole-Christoffer Granmo

Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art models has increased steadily reaching…

Hardware Architecture · Computer Science 2025-01-15 Rya Sanovar , Srikant Bharadwaj , Renee St. Amant , Victor Rühle , Saravan Rajmohan

Large Vision-Language Models (VLMs) have achieved remarkable success in multi-modal reasoning, but their inference time efficiency remains a significant challenge due to the memory overhead during decoding, especially when the query and…

Computer Vision and Pattern Recognition · Computer Science 2026-03-26 Fatih Ilhan , Gaowen Liu , Ramana Rao Kompella , Selim Furkan Tekin , Tiansheng Huang , Zachary Yahn , Yichang Xu , Ling Liu

Recently, encoder-decoder neural networks have shown impressive performance on many sequence-related tasks. The architecture commonly uses an attentional mechanism which allows the model to learn alignments between the source and the target…

Computation and Language · Computer Science 2017-11-06 Andros Tjandra , Sakriani Sakti , Satoshi Nakamura

Efficient inference on GPUs using large language models remains challenging due to memory bandwidth limitations, particularly during data transfers between High Bandwidth Memory (HBM) and SRAM in attention computations. Approximate…

Machine Learning · Computer Science 2025-06-06 Nirav Koley , Prajwal Singhania , Abhinav Bhatele

Transformers have been successfully used in various fields and are becoming the standard tools in computer vision. However, self-attention, a core component of transformers, has a quadratic complexity problem, which limits the use of…

Computer Vision and Pattern Recognition · Computer Science 2022-06-02 Jiuk Hong , Chaehyeon Lee , Soyoun Bang , Heechul Jung

As the demand for processing extended textual data grows, the ability to handle long-range dependencies and maintain computational efficiency is more critical than ever. One of the key issues for long-sequence modeling using attention-based…

Computation and Language · Computer Science 2025-05-26 Aosong Feng , Rex Ying , Leandros Tassiulas

Token representation strategies within large-scale neural architectures often rely on contextually refined embeddings, yet conventional approaches seldom encode structured relationships explicitly within token interactions. Self-attention…

Computation and Language · Computer Science 2025-03-27 James Blades , Frederick Somerfield , William Langley , Susan Everingham , Maurice Witherington

We introduce a novel segmental-attention model for automatic speech recognition. We restrict the decoder attention to segments to avoid quadratic runtime of global attention, better generalize to long sequences, and eventually enable…

Computation and Language · Computer Science 2022-10-27 Albert Zeyer , Robin Schmitt , Wei Zhou , Ralf Schlüter , Hermann Ney

Slim attention shrinks the context memory size by 2x for transformer models with MHA (multi-head attention), which can speed up inference by up to 2x for large context windows. Slim attention is an exact, mathematically identical…

Machine Learning · Computer Science 2025-06-04 Nils Graef , Andrew Wasielewski

We propose Sparse Sinkhorn Attention, a new efficient and sparse method for learning to attend. Our method is based on differentiable sorting of internal representations. Concretely, we introduce a meta sorting network that learns to…

Machine Learning · Computer Science 2020-02-27 Yi Tay , Dara Bahri , Liu Yang , Donald Metzler , Da-Cheng Juan

Recent work has shown that attention-based language models excel at recall, the ability to ground generations in tokens previously seen in context. However, the efficiency of attention-based models is bottle-necked during inference by the…

Computation and Language · Computer Science 2025-03-10 Simran Arora , Sabri Eyuboglu , Michael Zhang , Aman Timalsina , Silas Alberti , Dylan Zinsley , James Zou , Atri Rudra , Christopher Ré

The quadratic computational and memory complexities of large Transformers have limited their scalability for long document summarization. In this paper, we propose Hepos, a novel efficient encoder-decoder attention with head-wise positional…

Computation and Language · Computer Science 2021-04-13 Luyang Huang , Shuyang Cao , Nikolaus Parulian , Heng Ji , Lu Wang

Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different…

Computation and Language · Computer Science 2020-10-06 Alessandro Raganato , Yves Scherrer , Jörg Tiedemann
‹ Prev 1 2 3 10 Next ›