Related papers: Memory-Augmented Generative Adversarial Transforme…

Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling

Transformer encoder-decoder models have achieved great performance in dialogue generation tasks, however, their inability to process long dialogue history often leads to truncation of the context To address this problem, we propose a novel…

Computation and Language · Computer Science 2023-05-24 Qingyang Wu , Zhou Yu

Memory-Augmented Architecture for Long-Term Context Handling in Large Language Models

Large Language Models face significant challenges in maintaining coherent interactions over extended dialogues due to their limited contextual memory. This limitation often leads to fragmented exchanges and reduced relevance in responses,…

Machine Learning · Computer Science 2025-06-24 Haseeb Ullah Khan Shinwari , Muhammad Usama

Memory-Augmented Transformers: A Systematic Review from Neuroscience Principles to Enhanced Model Architectures

Memory is fundamental to intelligence, enabling learning, reasoning, and adaptability across biological and artificial systems. While Transformer architectures excel at sequence modeling, they face critical limitations in long-range context…

Machine Learning · Computer Science 2025-08-19 Parsa Omidi , Xingshuai Huang , Axel Laborieux , Bahareh Nikpour , Tianyu Shi , Armaghan Eshaghi

Enhancing Conversational Agents via Task-Oriented Adversarial Memory Adaptation

Conversational agents struggle to handle long conversations due to context window limitations. Therefore, memory systems are developed to leverage essential historical information. Existing memory systems typically follow a pipeline of…

Computation and Language · Computer Science 2026-01-30 Yimin Deng , Yuqing Fu , Derong Xu , Yejing Wang , Wei Ni , Jingtong Gao , Xiaopeng Li , Chengxu Liu , Xiao Han , Guoshuai Zhao , Xiangyu Zhao , Li Zhu , Xueming Qian

Memory Transformer

Transformer-based models have achieved state-of-the-art results in many natural language processing tasks. The self-attention architecture allows transformer to combine information from all elements of a sequence into context-aware…

Computation and Language · Computer Science 2021-02-17 Mikhail S. Burtsev , Yuri Kuratov , Anton Peganov , Grigory V. Sapunov

On Difficulties of Attention Factorization through Shared Memory

Transformers have revolutionized deep learning in numerous fields, including natural language processing, computer vision, and audio processing. Their strength lies in their attention mechanism, which allows for the discovering of complex…

Machine Learning · Computer Science 2024-04-02 Uladzislau Yorsh , Martin Holeňa , Ondřej Bojar , David Herel

Personalized Large Language Model Assistant with Evolving Conditional Memory

With the rapid development of large language models, AI assistants like ChatGPT have become increasingly integrated into people's works and lives but are limited in personalized services. In this paper, we present a plug-and-play framework…

Computation and Language · Computer Science 2024-10-15 Ruifeng Yuan , Shichao Sun , Yongqi Li , Zili Wang , Ziqiang Cao , Wenjie Li

Extended Mind Transformers

Pre-trained language models demonstrate general intelligence and common sense, but long inputs quickly become a bottleneck for memorizing information at inference time. We resurface a simple method, Memorizing Transformers (Wu et al.,…

Machine Learning · Computer Science 2024-06-05 Phoebe Klett , Thomas Ahle

Modifying Memories in Transformer Models

Large Transformer models have achieved impressive performance in many natural language tasks. In particular, Transformer based language models have been shown to have great capabilities in encoding factual knowledge in their vast amount of…

Computation and Language · Computer Science 2020-12-02 Chen Zhu , Ankit Singh Rawat , Manzil Zaheer , Srinadh Bhojanapalli , Daliang Li , Felix Yu , Sanjiv Kumar

Memory in humans and deep language models: Linking hypotheses for model augmentation

The computational complexity of the self-attention mechanism in Transformer models significantly limits their ability to generalize over long temporal durations. Memory-augmentation, or the explicit storing of past information in external…

Computation and Language · Computer Science 2022-11-29 Omri Raccah , Phoebe Chen , Ted L. Willke , David Poeppel , Vy A. Vo

Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls

Current Conversational AI systems employ different machine learning pipelines, as well as external knowledge sources and business logic to predict the next action. Maintaining various components in dialogue managers' pipeline adds…

Computation and Language · Computer Science 2024-04-15 Amin Hosseiny Marani , Ulie Schnaithmann , Youngseo Son , Akil Iyer , Manas Paldhe , Arushi Raghuvanshi

Knowledge-Infused Self Attention Transformers

Transformer-based language models have achieved impressive success in various natural language processing tasks due to their ability to capture complex dependencies and contextual information using self-attention mechanisms. However, they…

Computation and Language · Computer Science 2023-06-26 Kaushik Roy , Yuxin Zi , Vignesh Narayanan , Manas Gaur , Amit Sheth

Applications of the Transformer Architecture in AI-Assisted English Reading Comprehension

This paper studies interpretable and fair artificial intelligence architectures for understanding English reading. Introduced transformer-based models, integrating advanced attention mechanisms and gradient-based feature attribution. The…

Computation and Language · Computer Science 2026-04-28 Ping Li

Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

The Transformer architecture has led to significant gains in machine translation. However, most studies focus on only sentence-level translation without considering the context dependency within documents, leading to the inadequacy of…

Artificial Intelligence · Computer Science 2022-10-21 Yukun Feng , Feng Li , Ziang Song , Boyuan Zheng , Philipp Koehn

$\infty$-former: Infinite Memory Transformer

Transformers are unable to model long-term memories effectively, since the amount of computation they need to perform grows with the context length. While variations of efficient transformers have been proposed, they all have a finite…

Computation and Language · Computer Science 2022-03-28 Pedro Henrique Martins , Zita Marinho , André F. T. Martins

A Reinforced Generation of Adversarial Examples for Neural Machine Translation

Neural machine translation systems tend to fail on less decent inputs despite its significant efficacy, which may significantly harm the credibility of this systems-fathoming how and when neural-based systems fail in such cases is critical…

Computation and Language · Computer Science 2020-05-27 Wei Zou , Shujian Huang , Jun Xie , Xinyu Dai , Jiajun Chen

Building A Unified AI-centric Language System: analysis, framework and future work

Recent advancements in large language models have demonstrated that extended inference through techniques can markedly improve performance, yet these gains come with increased computational costs and the propagation of inherent biases found…

Computation and Language · Computer Science 2025-02-10 Edward Hong Wang , Cynthia Xin Wen

GMAT: Global Memory Augmentation for Transformers

Transformer-based models have become ubiquitous in natural language processing thanks to their large capacity, innate parallelism and high performance. The contextualizing component of a Transformer block is the $\textit{pairwise…

Machine Learning · Computer Science 2020-06-08 Ankit Gupta , Jonathan Berant

Procedural Memory Is Not All You Need: Bridging Cognitive Gaps in LLM-Based Agents

Large Language Models (LLMs) represent a landmark achievement in Artificial Intelligence (AI), demonstrating unprecedented proficiency in procedural tasks such as text generation, code completion, and conversational coherence. These…

Artificial Intelligence · Computer Science 2025-05-07 Schaun Wheeler , Olivier Jeunen

Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework. The main idea of Transformers is to capture the…

Sound · Computer Science 2022-07-05 Kun Wei , Pengcheng Guo , Ning Jiang