Related papers: Learning Associative Inference Using Fast Weight M…

Fast Weight Long Short-Term Memory

Associative memory using fast weights is a short-term memory mechanism that substantially improves the memory capacity and time scale of recurrent neural networks (RNNs). As recent studies introduced fast weights only to regular RNNs, it is…

Neural and Evolutionary Computing · Computer Science 2018-04-19 T. Anderson Keller , Sharath Nittur Sridhar , Xin Wang

FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models

Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method…

Computation and Language · Computer Science 2024-10-08 Junyi Zhu , Shuochen Liu , Yu Yu , Bo Tang , Yibo Yan , Zhiyu Li , Feiyu Xiong , Tong Xu , Matthew B. Blaschko

Meta-Learning Fast Weight Language Models

Dynamic evaluation of language models (LMs) adapts model parameters at test time using gradient information from previous tokens and substantially improves LM performance. However, it requires over 3x more compute than standard inference.…

Computation and Language · Computer Science 2022-12-06 Kevin Clark , Kelvin Guu , Ming-Wei Chang , Panupong Pasupat , Geoffrey Hinton , Mohammad Norouzi

Ensemble Neural Relation Extraction with Adaptive Boosting

Relation extraction has been widely studied to extract new relational facts from open corpus. Previous relation extraction methods are faced with the problem of wrong labels and noisy data, which substantially decrease the performance of…

Information Retrieval · Computer Science 2018-05-01 Dongdong Yang , Senzhang Wang , Zhoujun Li

Learning, Fast and Slow: Towards LLMs That Adapt Continually

Large language models (LLMs) are trained for downstream tasks by updating their parameters (e.g., via RL). However, updating parameters forces them to absorb task-specific information, which can result in catastrophic forgetting and loss of…

Machine Learning · Computer Science 2026-05-15 Rishabh Tiwari , Kusha Sareen , Lakshya A Agrawal , Joseph E. Gonzalez , Matei Zaharia , Kurt Keutzer , Inderjit S Dhillon , Rishabh Agarwal , Devvrit Khatri

LSTM Neural Reordering Feature for Statistical Machine Translation

Artificial neural networks are powerful models, which have been widely applied into many aspects of machine translation, such as language modeling and translation modeling. Though notable improvements have been made in these areas, the…

Computation and Language · Computer Science 2017-09-25 Yiming Cui , Shijin Wang , Jianfeng Li

Linear Transformers Are Secretly Fast Weight Programmers

We show the formal equivalence of linearised self-attention mechanisms and fast weight controllers from the early '90s, where a ``slow" neural net learns by gradient descent to program the ``fast weights" of another net through sequences of…

Machine Learning · Computer Science 2021-06-10 Imanol Schlag , Kazuki Irie , Jürgen Schmidhuber

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks. However, their substantial computational and memory requirements present challenges, especially for devices…

Computation and Language · Computer Science 2024-08-01 Keivan Alizadeh , Iman Mirzadeh , Dmitry Belenko , Karen Khatamifard , Minsik Cho , Carlo C Del Mundo , Mohammad Rastegari , Mehrdad Farajtabar

Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models

Although masked language models are highly performant and widely adopted by NLP practitioners, they can not be easily used for autoregressive language modelling (next word prediction and sequence probability estimation). We present an…

Computation and Language · Computer Science 2022-08-08 Vilém Zouhar , Marius Mosbach , Dietrich Klakow

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

Large Language Models (LLMs) have the capacity to store and recall facts. Through experimentation with open-source models, we observe that this ability to retrieve facts can be easily manipulated by changing contexts, even without altering…

Computation and Language · Computer Science 2024-12-02 Yibo Jiang , Goutham Rajendran , Pradeep Ravikumar , Bryon Aragam

Associative Long Short-Term Memory

We investigate a new method to augment recurrent neural networks with extra memory without increasing the number of network parameters. The system has an associative memory based on complex-valued vectors and is closely related to…

Neural and Evolutionary Computing · Computer Science 2016-05-20 Ivo Danihelka , Greg Wayne , Benigno Uria , Nal Kalchbrenner , Alex Graves

Enhanced LSTM for Natural Language Inference

Reasoning and inference are central to human and artificial intelligence. Modeling inference in human language is very challenging. With the availability of large annotated data (Bowman et al., 2015), it has recently become feasible to…

Computation and Language · Computer Science 2020-03-04 Qian Chen , Xiaodan Zhu , Zhenhua Ling , Si Wei , Hui Jiang , Diana Inkpen

Fast weight programming and linear transformers: from machine learning to neurobiology

Recent advances in artificial neural networks for machine learning, and language modeling in particular, have established a family of recurrent neural network (RNN) architectures that, unlike conventional RNNs with vector-form hidden…

Machine Learning · Computer Science 2026-03-19 Kazuki Irie , Samuel J. Gershman

Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition

Long short-term memory (LSTM) is normally used in recurrent neural network (RNN) as basic recurrent unit. However,conventional LSTM assumes that the state at current time step depends on previous time step. This assumption constraints the…

Machine Learning · Computer Science 2017-11-01 Fei Tao , Gang Liu

Lightweight LLM Agent Memory with Small Language Models

Although LLM agents can leverage tools for complex tasks, they still need memory to maintain cross-turn consistency and accumulate reusable information in long-horizon interactions. However, retrieval-based external memory systems incur low…

Artificial Intelligence · Computer Science 2026-04-23 Jiaquan Zhang , Chaoning Zhang , Shuxu Chen , Zhenzhen Huang , Pengcheng Zheng , Zhicheng Wang , Ping Guo , Fan Mo , Sung-Ho Bae , Jie Zou , Jiwei Wei , Yang Yang

Large Language Models as Model Organisms for Human Associative Learning

Associative learning--forming links between co-occurring items--is fundamental to human cognition, reshaping internal representations in complex ways. Testing hypotheses on how representational changes occur in biological systems is…

Machine Learning · Computer Science 2025-10-27 Camila Kolling , Vy Ai Vo , Mariya Toneva

Learning Natural Language Inference with LSTM

Natural language inference (NLI) is a fundamentally important task in natural language processing that has many applications. The recently released Stanford Natural Language Inference (SNLI) corpus has made it possible to develop and…

Computation and Language · Computer Science 2016-11-11 Shuohang Wang , Jing Jiang

Learning to Control Rapidly Changing Synaptic Connections: An Alternative Type of Memory in Sequence Processing Artificial Neural Networks

Short-term memory in standard, general-purpose, sequence-processing recurrent neural networks (RNNs) is stored as activations of nodes or "neurons." Generalising feedforward NNs to such RNNs is mathematically straightforward and natural,…

Neural and Evolutionary Computing · Computer Science 2022-11-18 Kazuki Irie , Jürgen Schmidhuber

Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks

Factorization Machines (FMs) are a supervised learning approach that enhances the linear regression model by incorporating the second-order feature interactions. Despite effectiveness, FM can be hindered by its modelling of all feature…

Machine Learning · Computer Science 2017-08-17 Jun Xiao , Hao Ye , Xiangnan He , Hanwang Zhang , Fei Wu , Tat-Seng Chua

Learning to Remember, Forget and Ignore using Attention Control in Memory

Typical neural networks with external memory do not effectively separate capacity for episodic and working memory as is required for reasoning in humans. Applying knowledge gained from psychological studies, we designed a new model called…

Machine Learning · Computer Science 2018-10-01 T. S. Jayram , Younes Bouhadjar , Ryan L. McAvoy , Tomasz Kornuta , Alexis Asseman , Kamil Rocki , Ahmet S. Ozcan