English
Related papers

Related papers: Attention Approximates Sparse Distributed Memory

200 papers

The Sparse Distributed Memory proposed by Pentii Kanerva (SDM in short) was thought to be a model of human long term memory. The architecture of the SDM permits to store binary patterns and to retrieve them using partially matching…

Computer Vision and Pattern Recognition · Computer Science 2012-07-30 Lou Marvin Caraig

Transformer architecture has been very successful long runner in the field of Deep Learning (DL) and Large Language Models (LLM) because of its powerful attention-based learning and parallel-natured architecture. As the models grow gigantic…

Machine Learning · Computer Science 2026-01-21 Phani Kumar , Nyshadham , Jyothendra Varma , Polisetty V R K , Aditya Rathore

Transformer-based models are popularly used in natural language processing (NLP). Its core component, self-attention, has aroused widespread interest. To understand the self-attention mechanism, a direct method is to visualize the attention…

Machine Learning · Computer Science 2021-07-02 Han Shi , Jiahui Gao , Xiaozhe Ren , Hang Xu , Xiaodan Liang , Zhenguo Li , James T. Kwok

Transformers are considered one of the most important deep learning models since 2018, in part because it establishes state-of-the-art (SOTA) records and could potentially replace existing Deep Neural Networks (DNNs). Despite the remarkable…

Machine Learning · Computer Science 2022-08-23 Hongwu Peng , Shaoyi Huang , Shiyang Chen , Bingbing Li , Tong Geng , Ang Li , Weiwen Jiang , Wujie Wen , Jinbo Bi , Hang Liu , Caiwen Ding

The Transformer architecture model, based on self-attention and multi-head attention, has achieved remarkable success in offline end-to-end Automatic Speech Recognition (ASR). However, self-attention and multi-head attention cannot be…

Computation and Language · Computer Science 2022-10-03 Chendong Zhao , Jianzong Wang , Wen qi Wei , Xiaoyang Qu , Haoqian Wang , Jing Xiao

Transformers are the mainstream of NLP applications and are becoming increasingly popular in other domains such as Computer Vision. Despite the improvements in model quality, the enormous computation costs make Transformers difficult at…

Machine Learning · Computer Science 2021-10-22 Liu Liu , Zheng Qu , Zhaodong Chen , Yufei Ding , Yuan Xie

We present an end-to-end trained memory system that quickly adapts to new data and generates samples like them. Inspired by Kanerva's sparse distributed memory, it has a robust distributed reading and writing mechanism. The memory is…

Machine Learning · Statistics 2018-06-19 Yan Wu , Greg Wayne , Alex Graves , Timothy Lillicrap

The key to a Transformer model is the self-attention mechanism, which allows the model to analyze an entire sequence in a computationally efficient manner. Recent work has suggested the possibility that general attention mechanisms used by…

Machine Learning · Computer Science 2020-01-01 Thomas Dowdell , Hongyu Zhang

Continual learning is a problem for artificial neural networks that their biological counterparts are adept at solving. Building on work using Sparse Distributed Memory (SDM) to connect a core neural circuit with the powerful Transformer…

Neural and Evolutionary Computing · Computer Science 2023-03-28 Trenton Bricken , Xander Davies , Deepak Singh , Dmitry Krotov , Gabriel Kreiman

The quadratic complexity of self-attention in Transformer models remains a significant bottleneck for processing long sequences and deploying large language models efficiently. For this approach, there has been significant research into…

Computation and Language · Computer Science 2026-05-26 Spandan Pratyush

Accommodating long sequences efficiently in autoregressive Transformers, especially within an extended context window, poses significant challenges due to the quadratic computational complexity and substantial KV memory requirements…

Computation and Language · Computer Science 2024-06-25 Chao Lou , Zixia Jia , Zilong Zheng , Kewei Tu

Transformers use the dense self-attention mechanism which gives a lot of flexibility for long-range connectivity. Over multiple layers of a deep transformer, the number of possible connectivity patterns increases exponentially. However,…

Machine Learning · Computer Science 2023-06-05 Md Shamim Hussain , Mohammed J. Zaki , Dharmashankar Subramanian

To overcome the quadratic cost of self-attention, recent works have proposed various sparse attention modules, most of which fall under one of two groups: 1) sparse attention under a hand-crafted patterns and 2) full attention followed by a…

Machine Learning · Computer Science 2022-10-28 Sungjun Cho , Seonwoo Min , Jinwoo Kim , Moontae Lee , Honglak Lee , Seunghoon Hong

In this paper, we share our reflections and insights on understanding Transformer architectures through the lens of associative memory--a classic psychological concept inspired by human cognition. We start with the basics of associative…

Machine Learning · Computer Science 2025-05-27 Shu Zhong , Mingyu Xu , Tenglong Ao , Guang Shi

Self-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic compute and memory requirements with respect to sequence length. Successful approaches…

Machine Learning · Computer Science 2020-10-27 Aurko Roy , Mohammad Saffar , Ashish Vaswani , David Grangier

Attention mechanism has been extensively integrated within mainstream neural network architectures, such as Transformers and graph attention networks. Yet, its underlying working principles remain somewhat elusive. What is its essence? Are…

Machine Learning · Computer Science 2024-12-25 Tianyu Ruan , Shihua Zhang

The Transformer is a fully attention-based alternative to recurrent networks that has achieved state-of-the-art results across a range of NLP tasks. In this paper, we analyze the structure of attention in a Transformer language model, the…

Computation and Language · Computer Science 2019-06-20 Jesse Vig , Yonatan Belinkov

Transformer has achieved great success in NLP. However, the quadratic complexity of the self-attention mechanism in Transformer makes it inefficient in handling long sequences. Many existing works explore to accelerate Transformers by…

Computation and Language · Computer Science 2021-09-03 Chuhan Wu , Fangzhao Wu , Tao Qi , Binxing Jiao , Daxin Jiang , Yongfeng Huang , Xing Xie

Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the Transformer, learn powerful context-aware word representations through layered, multi-headed attention. The multiple heads learn diverse types of word…

Computation and Language · Computer Science 2019-09-09 Gonçalo M. Correia , Vlad Niculae , André F. T. Martins

We introduce the Momentum Transformer, an attention-based deep-learning architecture, which outperforms benchmark time-series momentum and mean-reversion trading strategies. Unlike state-of-the-art Long Short-Term Memory (LSTM)…

Machine Learning · Computer Science 2022-11-24 Kieran Wood , Sven Giegerich , Stephen Roberts , Stefan Zohren
‹ Prev 1 2 3 10 Next ›