Related papers: Evaluating Memory Structure in LLM Agents

On the Structural Memory of LLM Agents

Memory plays a pivotal role in enabling large language model~(LLM)-based agents to engage in complex and long-term interactions, such as question answering (QA) and dialogue systems. While various memory modules have been proposed for these…

Computation and Language · Computer Science 2024-12-23 Ruihong Zeng , Jinyuan Fang , Siwei Liu , Zaiqiao Meng

StructMem: Structured Memory for Long-Horizon Behavior in LLMs

Long-term conversational agents need memory systems that capture relationships between events, not merely isolated facts, to support temporal reasoning and multi-hop question answering. Current approaches face a fundamental trade-off: flat…

Computation and Language · Computer Science 2026-04-24 Buqiang Xu , Yijun Chen , Jizhan Fang , Ruobin Zhong , Yunzhi Yao , Yuqi Zhu , Lun Du , Shumin Deng

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

Recent benchmarks for Large Language Model (LLM) agents primarily focus on evaluating reasoning, planning, and execution capabilities, while another critical component-memory, encompassing how agents memorize, update, and retrieve long-term…

Computation and Language · Computer Science 2026-03-19 Yuanzhe Hu , Yu Wang , Julian McAuley

Choosing How to Remember: Adaptive Memory Structures for LLM Agents

Memory is critical for enabling large language model (LLM) based agents to maintain coherent behavior over long-horizon interactions. However, existing agent memory systems suffer from two key gaps: they rely on a one-size-fits-all memory…

Artificial Intelligence · Computer Science 2026-02-17 Mingfei Lu , Mengjia Wu , Feng Liu , Jiawei Xu , Weikai Li , Haoyang Wang , Zhengdong Hu , Ying Ding , Yizhou Sun , Jie Lu , Yi Zhang

EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective

Recent benchmarks for Large Language Model (LLM) agents mainly evaluate reasoning, planning, and execution. However, memory is also essential for agents, as it enables them to store, update, and retrieve information over time. This ability…

Computation and Language · Computer Science 2026-05-19 Yuyao Wang , Zhongjian Zhang , Mo Chi , Kaichi Yu , Yuhan Li , Miao Peng , Bing Tong , Chen Zhang , Yan Zhou , Jia Li

MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents

Recent works have highlighted the significance of memory mechanisms in LLM-based agents, which enable them to store observed information and adapt to dynamic environments. However, evaluating their memory capabilities still remains…

Computation and Language · Computer Science 2025-06-30 Haoran Tan , Zeyu Zhang , Chen Ma , Xu Chen , Quanyu Dai , Zhenhua Dong

LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory

Recent large language model (LLM)-driven chat assistant systems have integrated memory components to track user-assistant chat histories, enabling more accurate and personalized responses. However, their long-term memory capabilities in…

Computation and Language · Computer Science 2025-03-06 Di Wu , Hongwei Wang , Wenhao Yu , Yuwei Zhang , Kai-Wei Chang , Dong Yu

Hierarchical Memory for High-Efficiency Long-Term Reasoning in LLM Agents

Long-term memory is one of the key factors influencing the reasoning capabilities of Large Language Model Agents (LLM Agents). Incorporating a memory mechanism that effectively integrates past interactions can significantly enhance…

Computation and Language · Computer Science 2025-08-01 Haoran Sun , Shaoning Zeng

Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework

Memory emerges as the core module in the large language model (LLM)-based agents for long-horizon complex tasks (e.g., multi-turn dialogue, game playing, scientific discovery), where memory can enable knowledge accumulation, iterative…

Computation and Language · Computer Science 2026-05-04 Yanchen Wu , Tenghui Lin , Yingli Zhou , Fangyuan Zhang , Qintian Guo , Xun Zhou , Sibo Wang , Xilin Liu , Yuchi Ma , Yixiang Fang

StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns

Long-term memory (LTM) is essential for large language models (LLMs) to achieve autonomous intelligence in complex, evolving environments. Despite increasing efforts in memory-augmented and retrieval-based architectures, there remains a…

Computation and Language · Computer Science 2025-06-17 Luanbo Wan , Weizhi Ma

A Survey on the Memory Mechanism of Large Language Model based Agents

Large language model (LLM) based agents have recently attracted much attention from the research and industry communities. Compared with original LLMs, LLM-based agents are featured in their self-evolving capability, which is the basis for…

Artificial Intelligence · Computer Science 2024-04-23 Zeyu Zhang , Xiaohe Bo , Chen Ma , Rui Li , Xu Chen , Quanyu Dai , Jieming Zhu , Zhenhua Dong , Ji-Rong Wen

Lightweight LLM Agent Memory with Small Language Models

Although LLM agents can leverage tools for complex tasks, they still need memory to maintain cross-turn consistency and accumulate reusable information in long-horizon interactions. However, retrieval-based external memory systems incur low…

Artificial Intelligence · Computer Science 2026-04-23 Jiaquan Zhang , Chaoning Zhang , Shuxu Chen , Zhenzhen Huang , Pengcheng Zheng , Zhicheng Wang , Ping Guo , Fan Mo , Sung-Ho Bae , Jie Zou , Jiwei Wei , Yang Yang

Memp: Exploring Agent Procedural Memory

Large Language Models (LLMs) based agents excel at diverse tasks, yet they suffer from brittle procedural memory that is manually engineered or entangled in static parameters. In this work, we investigate strategies to endow agents with a…

Computation and Language · Computer Science 2026-04-16 Runnan Fang , Yuan Liang , Xiaobin Wang , Jialong Wu , Shuofei Qiao , Pengjun Xie , Fei Huang , Huajun Chen , Ningyu Zhang

How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior

Memory is a critical component in large language model (LLM)-based agents, enabling them to store and retrieve past executions to improve task performance over time. In this paper, we conduct an empirical study on how memory management…

Artificial Intelligence · Computer Science 2025-10-14 Zidi Xiong , Yuping Lin , Wenya Xie , Pengfei He , Zirui Liu , Jiliang Tang , Himabindu Lakkaraju , Zhen Xiang

Procedural Memory Is Not All You Need: Bridging Cognitive Gaps in LLM-Based Agents

Large Language Models (LLMs) represent a landmark achievement in Artificial Intelligence (AI), demonstrating unprecedented proficiency in procedural tasks such as text generation, code completion, and conversational coherence. These…

Artificial Intelligence · Computer Science 2025-05-07 Schaun Wheeler , Olivier Jeunen

GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations

Large Language Model (LLM) agents increasingly serve as personal assistants and workplace collaborators, where their utility depends on memory systems that extract, retrieve, and apply information across long-running conversations. However,…

Computation and Language · Computer Science 2026-05-19 Jingbo Yang , Kwei-Herng Lai , Xiaowen Wang , Shiyu Chang , Yaar Harari , Evgeniy Gabrilovich

Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents

Long-term memory is a critical capability for multimodal large language model (MLLM) agents, particularly in conversational settings where information accumulates and evolves over time. However, existing benchmarks either evaluate…

Computation and Language · Computer Science 2026-01-08 Yuanchen Bei , Tianxin Wei , Xuying Ning , Yanjun Zhao , Zhining Liu , Xiao Lin , Yada Zhu , Hendrik Hamann , Jingrui He , Hanghang Tong

Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents

Large language model (LLM) agents face fundamental limitations in long-horizon reasoning due to finite context windows, making effective memory management critical. Existing methods typically handle long-term memory (LTM) and short-term…

Computation and Language · Computer Science 2026-05-01 Yi Yu , Liuyi Yao , Yuexiang Xie , Qingquan Tan , Jiaqi Feng , Yaliang Li , Libing Wu

Mem2ActBench: A Benchmark for Evaluating Long-Term Memory Utilization in Task-Oriented Autonomous Agents

Large Language Model (LLM)-based agents are increasingly deployed for complex, tool-based tasks where long-term memory is critical to driving actions. Existing benchmarks, however, primarily test a angent's ability to passively retrieve…

Computation and Language · Computer Science 2026-01-29 Yiting Shen , Kun Li , Wei Zhou , Songlin Hu

AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts

The evolution of Large Language Models (LLMs) into autonomous agents necessitates the management of extensive, dynamic contexts. Current benchmarks, however, remain largely static, relying on passive retrieval tasks that fail to simulate…

Computation and Language · Computer Science 2026-02-02 Shicheng Fang , Yuxin Wang , Xiaoran Liu , Jiahao Lu , Chuanyuan Tan , Xinchi Chen , Yining Zheng , Xuanjing Huang , Xipeng Qiu