English
Related papers

Related papers: Global memory transformer for processing long docu…

200 papers

Transformer-based models have become ubiquitous in natural language processing thanks to their large capacity, innate parallelism and high performance. The contextualizing component of a Transformer block is the $\textit{pairwise…

Machine Learning · Computer Science 2020-06-08 Ankit Gupta , Jonathan Berant

Transformer-based models have achieved state-of-the-art results in many natural language processing tasks. The self-attention architecture allows transformer to combine information from all elements of a sequence into context-aware…

Computation and Language · Computer Science 2021-02-17 Mikhail S. Burtsev , Yuri Kuratov , Anton Peganov , Grigory V. Sapunov

Pre-trained language models demonstrate general intelligence and common sense, but long inputs quickly become a bottleneck for memorizing information at inference time. We resurface a simple method, Memorizing Transformers (Wu et al.,…

Machine Learning · Computer Science 2024-06-05 Phoebe Klett , Thomas Ahle

The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks, and is crucial when learning from datasets where longer problem instances are rare. These…

Computation and Language · Computer Science 2022-11-15 Cem Anil , Yuhuai Wu , Anders Andreassen , Aitor Lewkowycz , Vedant Misra , Vinay Ramasesh , Ambrose Slone , Guy Gur-Ari , Ethan Dyer , Behnam Neyshabur

Transformers have become the gold standard for many natural language processing tasks and, in particular, for multi-hop question answering (MHQA). This task includes processing a long document and reasoning over the multiple parts of it.…

Computation and Language · Computer Science 2023-12-01 Alsu Sagirova , Mikhail Burtsev

Unlike recurrent models, conventional wisdom has it that Transformers cannot perfectly model regular languages. Inspired by the notion of working memory, we propose a new Transformer variant named RegularGPT. With its novel combination of…

Computation and Language · Computer Science 2023-05-09 Ta-Chung Chi , Ting-Han Fan , Alexander I. Rudnicky , Peter J. Ramadge

Decision Transformer-based decision-making agents have shown the ability to generalize across multiple tasks. However, their performance relies on massive data and computation. We argue that this inefficiency stems from the forgetting…

Machine Learning · Computer Science 2024-05-30 Jikun Kang , Romain Laroche , Xingdi Yuan , Adam Trischler , Xue Liu , Jie Fu

Pre-trained language models have recently emerged as a powerful tool for fine-tuning a variety of language tasks. Ideally, when models are pre-trained on large amount of data, they are expected to gain implicit knowledge. In this paper, we…

Computation and Language · Computer Science 2023-06-22 Mohamad Ballout , Ulf Krumnack , Gunther Heidemann , Kai-Uwe Kühnberger

We show that transformer-based large language models are computationally universal when augmented with an external memory. Any deterministic language model that conditions on strings of bounded length is equivalent to a finite automaton,…

Computation and Language · Computer Science 2023-01-12 Dale Schuurmans

A major limitation for the broader scope of problems solvable by transformers is the quadratic scaling of computational complexity with input size. In this study, we investigate the recurrent memory augmentation of pre-trained transformer…

Computation and Language · Computer Science 2024-02-07 Aydar Bulatov , Yuri Kuratov , Yermek Kapushev , Mikhail S. Burtsev

This paper introduces the Large Memory Model (LM2), a decoder-only Transformer architecture enhanced with an auxiliary memory module that aims to address the limitations of standard Transformers in multi-step reasoning, relational…

Computation and Language · Computer Science 2025-02-11 Jikun Kang , Wenqi Wu , Filippos Christianos , Alex J. Chan , Fraser Greenlee , George Thomas , Marvin Purtorab , Andy Toulis

Most approaches to long-context processing increase the complexity of the transformer's internal architecture by integrating mechanisms such as recurrence or auxiliary memory modules. In this work, we introduce an alternative approach that…

Computation and Language · Computer Science 2025-10-28 Billy Dickson , Zoran Tiganj

This paper addresses the limitations of large language models in understanding long-term context. It proposes a model architecture equipped with a long-term memory mechanism to improve the retention and retrieval of semantic information…

Computation and Language · Computer Science 2025-05-30 Yue Xing , Tao Yang , Yijiashun Qi , Minggu Wei , Yu Cheng , Honghui Xin

Transformers achieve state-of-the-art performance for natural language processing tasks by pre-training on large-scale text corpora. They are extremely compute-intensive and have very high sample complexity. Memory replay is a mechanism…

Machine Learning · Computer Science 2022-05-23 Rui Liu , Barzan Mozafari

World modelling, i.e. building a representation of the rules that govern the world so as to predict its evolution, is an essential ability for any agent interacting with the physical world. Recent applications of the Transformer…

Machine Learning · Computer Science 2024-05-31 Francesco Petri , Luigi Asprino , Aldo Gangemi

Conversational AI systems that rely on Large Language Models, like Transformers, have difficulty interweaving external data (like facts) with the language they generate. Vanilla Transformer architectures are not designed for answering…

Computation and Language · Computer Science 2024-03-01 Stephan Raaijmakers , Roos Bakker , Anita Cremers , Roy de Kleijn , Tom Kouwenhoven , Tessa Verhoef

World models enable agents to plan within imagined environments by predicting future states conditioned on past observations and actions. However, their ability to plan over long horizons is limited by the effective memory span of the…

Artificial Intelligence · Computer Science 2025-12-09 Eli J. Laird , Corey Clark

We present a language model that combines a large parametric neural network (i.e., a transformer) with a non-parametric episodic memory component in an integrated architecture. Our model uses extended short-term context by caching local…

Computation and Language · Computer Science 2021-02-05 Dani Yogatama , Cyprien de Masson d'Autume , Lingpeng Kong

End-to-end task-oriented dialogue is challenging since knowledge bases are usually large, dynamic and hard to incorporate into a learning framework. We propose the global-to-local memory pointer (GLMP) networks to address this issue. In our…

Computation and Language · Computer Science 2019-04-01 Chien-Sheng Wu , Richard Socher , Caiming Xiong

The impressive performance gains of modern language models currently rely on scaling parameters: larger models store more world knowledge and reason better. Yet compressing all world knowledge into parameters is unnecessary, as only a…

Computation and Language · Computer Science 2026-03-24 Hadi Pouransari , David Grangier , C Thomas , Michael Kirchhof , Oncel Tuzel
‹ Prev 1 2 3 10 Next ›