English
Related papers

Related papers: Memorizing Transformers

200 papers

Pre-trained language models demonstrate general intelligence and common sense, but long inputs quickly become a bottleneck for memorizing information at inference time. We resurface a simple method, Memorizing Transformers (Wu et al.,…

Machine Learning · Computer Science 2024-06-05 Phoebe Klett , Thomas Ahle

Transformer-based models have achieved state-of-the-art results in many natural language processing tasks. The self-attention architecture allows transformer to combine information from all elements of a sequence into context-aware…

Computation and Language · Computer Science 2021-02-17 Mikhail S. Burtsev , Yuri Kuratov , Anton Peganov , Grigory V. Sapunov

The ability of machine learning models to store input information in hidden layer vector embeddings, analogous to the concept of `memory', is widely employed but not well characterized. We find that language model embeddings typically…

Computation and Language · Computer Science 2026-05-20 Benjamin L. Badger

A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language…

Artificial Intelligence · Computer Science 2024-08-05 Davide Maltoni , Matteo Ferrara

In this paper we propose augmenting Vision Transformer models with learnable memory tokens. Our approach allows the model to adapt to new tasks, using few parameters, while optionally preserving its capabilities on previously learned tasks.…

Computer Vision and Pattern Recognition · Computer Science 2022-03-31 Mark Sandler , Andrey Zhmoginov , Max Vladymyrov , Andrew Jackson

Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore, which allows them to learn through explicitly memorizing the training datapoints. While effective, these models often…

Computation and Language · Computer Science 2021-11-16 Junxian He , Graham Neubig , Taylor Berg-Kirkpatrick

Continual incorporation of new knowledge is essential for the long-term evolution of large language models (LLMs). Existing approaches typically rely on parameter-update algorithms to mitigate catastrophic forgetting, yet they suffer from…

Machine Learning · Computer Science 2026-05-07 Kaustubh Pethkar , Ziyang Xiong , Zuofeng Shang , Yingcong Li

Image captioning models aim at connecting Vision and Language by providing natural language descriptions of input images. In the past few years, the task has been tackled by learning parametric models and proposing visual feature extraction…

Computer Vision and Pattern Recognition · Computer Science 2022-08-23 Sara Sarto , Marcella Cornia , Lorenzo Baraldi , Rita Cucchiara

Modern large language models (LLMs) excel at tasks that require storing and retrieving knowledge, such as factual recall and question answering. Transformers are central to this capability because they can encode information during training…

Machine Learning · Statistics 2026-03-18 Nuri Mert Vural , Alberto Bietti , Mahdi Soltanolkotabi , Denny Wu

We propose a new method for estimating how much a model knows about a datapoint and use it to measure the capacity of modern language models. Prior studies of language model memorization have struggled to disentangle memorization from…

Recent research suggests that the feed-forward module within Transformers can be viewed as a collection of key-value memories, where the keys learn to capture specific patterns from the input based on the training examples. The values then…

Computation and Language · Computer Science 2023-10-25 Sunit Bhattacharya , Ondrej Bojar

The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge into parameters is unnecessary, as only a…

Computation and Language · Computer Science 2026-03-24 Hadi Pouransari , David Grangier , C Thomas , Michael Kirchhof , Oncel Tuzel

Recent studies have demonstrated that the performance of transformers on the task of language modeling obeys a power-law relationship with model size over six orders of magnitude. While transformers exhibit impressive scaling, their…

Machine Learning · Computer Science 2021-10-07 Narsimha Chilkuri , Eric Hunsberger , Aaron Voelker , Gurshaant Malik , Chris Eliasmith

Large Transformer models have achieved impressive performance in many natural language tasks. In particular, Transformer based language models have been shown to have great capabilities in encoding factual knowledge in their vast amount of…

Computation and Language · Computer Science 2020-12-02 Chen Zhu , Ankit Singh Rawat , Manzil Zaheer , Srinadh Bhojanapalli , Daliang Li , Felix Yu , Sanjiv Kumar

Tool-augmented language models, equipped with retrieval, memory, or external APIs, are reshaping AI, yet their theoretical advantages remain underexplored. In this paper, we address this question by demonstrating the benefits of in-tool…

Machine Learning · Computer Science 2026-04-03 Sam Houliston , Ambroise Odonnat , Charles Arnal , Vivien Cabannes

Foundation language models learn from their finetuning input context in different ways. In this paper, we reformulate inputs during finetuning for challenging translation tasks, leveraging model strengths from pretraining in novel ways to…

Computation and Language · Computer Science 2026-01-05 Brian Yu , Hansen Lillemark , Kurt Keutzer

A major limitation for the broader scope of problems solvable by transformers is the quadratic scaling of computational complexity with input size. In this study, we investigate the recurrent memory augmentation of pre-trained transformer…

Computation and Language · Computer Science 2024-02-07 Aydar Bulatov , Yuri Kuratov , Yermek Kapushev , Mikhail S. Burtsev

Despite their wide adoption, the underlying training and memorization dynamics of very large language models is not well understood. We empirically study exact memorization in causal and masked language modeling, across model sizes and…

Computation and Language · Computer Science 2022-11-04 Kushal Tirumala , Aram H. Markosyan , Luke Zettlemoyer , Armen Aghajanyan

Recent large language models (LLM) exhibit sub-optimal performance on low-resource languages, as the training data of these models is usually dominated by English and other high-resource languages. Furthermore, it is challenging to train…

Computation and Language · Computer Science 2023-12-18 Zoltan Csaki , Pian Pawakapan , Urmish Thakker , Qiantong Xu

A distinction is often drawn between a model's ability to predict a label for an evaluation sample that is directly memorised from highly similar training samples versus an ability to predict the label via some method of generalisation. In…

Computation and Language · Computer Science 2023-11-22 Tim Hartill , Joshua Bensemann , Michael Witbrock , Patricia J. Riddle
‹ Prev 1 2 3 10 Next ›