English

$\text{Memory}^3$: Language Modeling with Explicit Memory

Computation and Language 2025-01-29 v1 Artificial Intelligence Machine Learning

Abstract

The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size, training cost, and inference cost, all proportional to the amount of remaining "abstract knowledge". As a preliminary proof of concept, we train from scratch a 2.4B LLM, which achieves better performance than much larger LLMs as well as RAG models, and maintains higher decoding speed than RAG. The model is named Memory3\text{Memory}^3, since explicit memory is the third form of memory in LLMs after implicit memory (model parameters) and working memory (context key-values). We introduce a memory circuitry theory to support the externalization of knowledge, and present novel techniques including a memory sparsification mechanism that makes storage tractable and a two-stage pretraining scheme that facilitates memory formation.

Keywords

Cite

@article{arxiv.2407.01178,
  title  = {$\text{Memory}^3$: Language Modeling with Explicit Memory},
  author = {Hongkang Yang and Zehao Lin and Wenjin Wang and Hao Wu and Zhiyu Li and Bo Tang and Wenqiang Wei and Jinbo Wang and Zeyun Tang and Shichao Song and Chenyang Xi and Yu Yu and Kai Chen and Feiyu Xiong and Linpeng Tang and Weinan E},
  journal= {arXiv preprint arXiv:2407.01178},
  year   = {2025}
}
R2 v1 2026-06-28T17:24:47.744Z