English
Related papers

Related papers: Augmenting Self-attention with Persistent Memory

200 papers

Recently, self-attention models such as Transformers have given competitive results compared to recurrent neural network systems in speech recognition. The key factor for the outstanding performance of self-attention models is their ability…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-29 Shucong Zhang , Erfan Loweimi , Peter Bell , Steve Renals

Transformer is a ubiquitous model for natural language processing and has attracted wide attentions in computer vision. The attention maps are indispensable for a transformer model to encode the dependencies among input tokens. However,…

Machine Learning · Computer Science 2021-02-26 Yujing Wang , Yaming Yang , Jiangang Bai , Mingliang Zhang , Jing Bai , Jing Yu , Ce Zhang , Gao Huang , Yunhai Tong

Feed-forward layers constitute two-thirds of a transformer model's parameters, yet their role in the network remains under-explored. We show that feed-forward layers in transformer-based language models operate as key-value memories, where…

Computation and Language · Computer Science 2021-09-07 Mor Geva , Roei Schuster , Jonathan Berant , Omer Levy

Recent research suggests that the feed-forward module within Transformers can be viewed as a collection of key-value memories, where the keys learn to capture specific patterns from the input based on the training examples. The values then…

Computation and Language · Computer Science 2023-10-25 Sunit Bhattacharya , Ondrej Bojar

This work introduces a novel Retention Layer mechanism for Transformer based architectures, addressing their inherent lack of intrinsic retention capabilities. Unlike human cognition, which can encode and dynamically recall symbolic…

Machine Learning · Computer Science 2025-01-17 M. Murat Yaslioglu

Self-attention is key to the remarkable success of transformers in sequence modeling tasks including many applications in natural language processing and computer vision. Like neural network layers, these attention mechanisms are often…

Machine Learning · Computer Science 2024-06-21 Tan M. Nguyen , Tam Nguyen , Nhat Ho , Andrea L. Bertozzi , Richard G. Baraniuk , Stanley J. Osher

The key to a Transformer model is the self-attention mechanism, which allows the model to analyze an entire sequence in a computationally efficient manner. Recent work has suggested the possibility that general attention mechanisms used by…

Machine Learning · Computer Science 2020-01-01 Thomas Dowdell , Hongyu Zhang

Recently, the Transformer model that is based solely on attention mechanisms, has advanced the state-of-the-art on various machine translation tasks. However, recent studies reveal that the lack of recurrence hinders its further improvement…

Computation and Language · Computer Science 2019-04-08 Jie Hao , Xing Wang , Baosong Yang , Longyue Wang , Jinfeng Zhang , Zhaopeng Tu

Transformer-based models have achieved state-of-the-art results in many natural language processing tasks. The self-attention architecture allows transformer to combine information from all elements of a sequence into context-aware…

Computation and Language · Computer Science 2021-02-17 Mikhail S. Burtsev , Yuri Kuratov , Anton Peganov , Grigory V. Sapunov

The Transformer model is widely used in natural language processing for sentence representation. However, the previous Transformer-based models focus on function words that have limited meaning in most cases and could merely extract…

Computation and Language · Computer Science 2021-07-05 Yu Shi

Self-attention model have shown its flexibility in parallel computation and the effectiveness on modeling both long- and short-term dependencies. However, it calculates the dependencies between representations without considering the…

Computation and Language · Computer Science 2019-02-18 Baosong Yang , Jian Li , Derek Wong , Lidia S. Chao , Xing Wang , Zhaopeng Tu

Transformers are one of the most successful architectures of modern neural networks. At their core there is the so-called attention mechanism, which recently interested the physics community as it can be written as the derivative of an…

Machine Learning · Computer Science 2024-09-25 Francesco D'Amico , Matteo Negri

Multilayer transformer networks consist of interleaved self-attention and feedforward sublayers. Could ordering the sublayers in a different pattern lead to better performance? We generate randomly ordered transformers and train them with…

Computation and Language · Computer Science 2020-04-24 Ofir Press , Noah A. Smith , Omer Levy

Neural networks using transformer-based architectures have recently demonstrated great power and flexibility in modeling sequences of many types. One of the core components of transformer networks is the attention layer, which allows…

Machine Learning · Computer Science 2019-07-16 Matthew Spellings

Deep attention models have advanced the modelling of sequential data across many domains. For language modelling in particular, the Transformer-XL -- a Transformer augmented with a long-range memory of past activations -- has been shown to…

Machine Learning · Computer Science 2020-07-08 Jack W. Rae , Ali Razavi

Transformer-based pre-trained models have gained much advance in recent years, becoming one of the most important backbones in natural language processing. Recent work shows that the attention mechanism inside Transformer may not be…

Computation and Language · Computer Science 2022-10-27 Yile Wang , Linyi Yang , Zhiyang Teng , Ming Zhou , Yue Zhang

Retentive Network (RetNet) represents a significant advancement in neural network architecture, offering an efficient alternative to the Transformer. While Transformers rely on self-attention to model dependencies, they suffer from high…

Computation and Language · Computer Science 2025-06-10 Haiqi Yang , Zhiyuan Li , Yi Chang , Yuan Wu

Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model…

Computation and Language · Computer Science 2022-04-27 Haozhe Ji , Rongsheng Zhang , Zhenyu Yang , Zhipeng Hu , Minlie Huang

The Transformer is an extremely powerful and prominent deep learning architecture. In this work, we challenge the commonly held belief in deep learning that going deeper is better, and show an alternative design approach that is building…

Machine Learning · Computer Science 2022-11-10 Jason Ross Brown , Yiren Zhao , Ilia Shumailov , Robert D Mullins

Recent advancements in attention mechanisms have replaced recurrent neural networks and its variants for machine translation tasks. Transformer using attention mechanism solely achieved state-of-the-art results in sequence modeling. Neural…

Computation and Language · Computer Science 2020-04-02 Prakhar Thapak , Prodip Hore
‹ Prev 1 2 3 10 Next ›