Related papers: Language Model Memory and Memory Models for Langua…

Learning to Recall with Transformers Beyond Orthogonal Embeddings

Modern large language models (LLMs) excel at tasks that require storing and retrieving knowledge, such as factual recall and question answering. Transformers are central to this capability because they can encode information during training…

Machine Learning · Statistics 2026-03-18 Nuri Mert Vural , Alberto Bietti , Mahdi Soltanolkotabi , Denny Wu

Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens

Standard pretrained language models operate on sequences of subword tokens without direct access to the characters that compose each token's string representation. We probe the embedding layer of pretrained language models and show that…

Computation and Language · Computer Science 2022-06-09 Itay Itzhak , Omer Levy

Embedding Word Similarity with Neural Machine Translation

Neural language models learn word representations, or embeddings, that capture rich linguistic and conceptual information. Here we investigate the embeddings learned by neural machine translation models, a recently-developed class of neural…

Computation and Language · Computer Science 2015-04-06 Felix Hill , Kyunghyun Cho , Sebastien Jean , Coline Devin , Yoshua Bengio

Towards General Continuous Memory for Vision-Language Models

Language models (LMs) and their extension, vision-language models (VLMs), have achieved remarkable performance across various tasks. However, they still struggle with complex reasoning tasks that require multimodal or multilingual…

Machine Learning · Computer Science 2025-07-09 Wenyi Wu , Zixuan Song , Kun Zhou , Yifei Shao , Zhiting Hu , Biwei Huang

Learning to Embed Words in Context for Syntactic Tasks

We present models for embedding words in the context of surrounding words. Such models, which we refer to as token embeddings, represent the characteristics of a word that are specific to a given context, such as word sense, syntactic…

Computation and Language · Computer Science 2017-06-13 Lifu Tu , Kevin Gimpel , Karen Livescu

Language Models are Universal Embedders

In the large language model (LLM) revolution, embedding is a key component of various systems, such as retrieving knowledge or memories for LLMs or building content moderation filters. As such cases span from English to other natural or…

Computation and Language · Computer Science 2025-05-23 Xin Zhang , Zehan Li , Yanzhao Zhang , Dingkun Long , Pengjun Xie , Meishan Zhang , Min Zhang

Learning with Memory Embeddings

Embedding learning, a.k.a. representation learning, has been shown to be able to model large-scale semantic knowledge graphs. A key concept is a mapping of the knowledge graph to a tensor representation whose entries are predicted by models…

Artificial Intelligence · Computer Science 2016-05-10 Volker Tresp , Cristóbal Esteban , Yinchong Yang , Stephan Baier , Denis Krompaß

Memorizing Transformers

Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus…

Machine Learning · Computer Science 2022-03-18 Yuhuai Wu , Markus N. Rabe , DeLesley Hutchins , Christian Szegedy

Vocabulary-level Memory Efficiency for Language Model Fine-tuning

The extensive memory footprint of language model (LM) fine-tuning poses a challenge for both researchers and practitioners. LMs use an embedding matrix to represent extensive vocabularies, forming a substantial proportion of the model…

Computation and Language · Computer Science 2025-03-26 Miles Williams , Nikolaos Aletras

Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

Recurrent neural networks have been very successful at predicting sequences of words in tasks such as language modeling. However, all such models are based on the conventional classification framework, where the model is trained against…

Machine Learning · Computer Science 2017-03-14 Hakan Inan , Khashayar Khosravi , Richard Socher

Information Leakage in Embedding Models

Embeddings are functions that map raw input data to low-dimensional vector representations, while preserving important semantic information about the inputs. Pre-training embeddings on a large amount of unlabeled data and fine-tuning them…

Machine Learning · Computer Science 2020-08-21 Congzheng Song , Ananth Raghunathan

Embedding Words and Senses Together via Joint Knowledge-Enhanced Training

Word embeddings are widely used in Natural Language Processing, mainly due to their success in capturing semantic information from massive corpora. However, their creation process does not allow the different meanings of a word to be…

Computation and Language · Computer Science 2017-06-22 Massimiliano Mancini , Jose Camacho-Collados , Ignacio Iacobacci , Roberto Navigli

Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers

End-to-end speech recognition is a promising technology for enabling compact automatic speech recognition (ASR) systems since it can unify the acoustic and language model into a single neural network. However, as a drawback, training of…

Computation and Language · Computer Science 2022-02-17 Yotaro Kubo , Shigeki Karita , Michiel Bacchiani

Arithmetic with Language Models: from Memorization to Computation

A better understanding of the emergent computation and problem-solving capabilities of recent large language models is of paramount importance to further improve them and broaden their applicability. This work investigates how a language…

Artificial Intelligence · Computer Science 2024-08-05 Davide Maltoni , Matteo Ferrara

Learned In Speech Recognition: Contextual Acoustic Word Embeddings

End-to-end acoustic-to-word speech recognition models have recently gained popularity because they are easy to train, scale well to large amounts of training data, and do not require a lexicon. In addition, word models may also be easier to…

Computation and Language · Computer Science 2019-02-20 Shruti Palaskar , Vikas Raunak , Florian Metze

Word Embeddings Are Steers for Language Models

Language models (LMs) automatically learn word embeddings during pre-training on language corpora. Although word embeddings are usually interpreted as feature vectors for individual words, their roles in language model generation remain…

Computation and Language · Computer Science 2024-06-07 Chi Han , Jialiang Xu , Manling Li , Yi Fung , Chenkai Sun , Nan Jiang , Tarek Abdelzaher , Heng Ji

Distilling Relation Embeddings from Pre-trained Language Models

Pre-trained language models have been found to capture a surprisingly rich amount of lexical knowledge, ranging from commonsense properties of everyday concepts to detailed factual knowledge about named entities. Among others, this makes it…

Computation and Language · Computer Science 2022-09-12 Asahi Ushio , Jose Camacho-Collados , Steven Schockaert

MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension

Machine comprehension(MC) style question answering is a representative problem in natural language processing. Previous methods rarely spend time on the improvement of encoding layer, especially the embedding of syntactic information and…

Artificial Intelligence · Computer Science 2017-07-31 Boyuan Pan , Hao Li , Zhou Zhao , Bin Cao , Deng Cai , Xiaofei He

Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay

Models trained on a new task typically degrade on prior tasks, a phenomenon known as forgetting. Traditionally, mitigating forgetting has required replaying stored exemplars from prior tasks, which is often impractical. By contrast,…

Machine Learning · Computer Science 2026-05-26 Martin Marek , Dongkyu Cho , Shikai Qiu , Rumi Chunara , Pavel Izmailov , Andrew Gordon Wilson

Understanding Neural Machine Translation by Simplification: The Case of Encoder-free Models

In this paper, we try to understand neural machine translation (NMT) via simplifying NMT architectures and training encoder-free NMT models. In an encoder-free model, the sums of word embeddings and positional embeddings represent the…

Computation and Language · Computer Science 2019-07-19 Gongbo Tang , Rico Sennrich , Joakim Nivre