English
Related papers

Related papers: Language Models Without a Trainable Input Embeddin…

200 papers

Transformer-based pre-trained language models are vocabulary-dependent, mapping by default each token to its corresponding embedding. This one-to-one mapping results into embedding matrices that occupy a lot of memory (i.e. millions of…

Computation and Language · Computer Science 2022-11-01 Huiyin Xue , Nikolaos Aletras

Large language models route every input through a learned embedding table of shape |V| x d_model, consuming hundreds of millions to billions of trainable parameters at frontier scale. We introduce Kronecker Embeddings, a deterministic…

Computation and Language · Computer Science 2026-05-29 Rohan Shravan

The ability of machine learning models to store input information in hidden layer vector embeddings, analogous to the concept of `memory', is widely employed but not well characterized. We find that language model embeddings typically…

Computation and Language · Computer Science 2026-05-20 Benjamin L. Badger

Modern large language models (LLMs) excel at tasks that require storing and retrieving knowledge, such as factual recall and question answering. Transformers are central to this capability because they can encode information during training…

Machine Learning · Statistics 2026-03-18 Nuri Mert Vural , Alberto Bietti , Mahdi Soltanolkotabi , Denny Wu

Deep learning has enabled remarkable progress in binary code analysis. In particular, pre-trained embeddings of assembly code have become a gold standard for solving analysis tasks, such as measuring code similarity or recognizing…

Machine Learning · Computer Science 2025-02-14 Alwin Maier , Felix Weissberg , Konrad Rieck

Adapting language models to new data distributions by simple finetuning is challenging. This is due to the rigidity of their subword tokenizers, which typically remain unchanged during adaptation. This inflexibility often leads to…

Computation and Language · Computer Science 2026-05-14 Abraham Toluwase Owodunni , Orevaoghene Ahia , Sachin Kumar

Modern language models use a single matrix for input embedding and output projection. This couples two distinct objectives: token representation and discrimination over a vocabulary. This work introduces Leviathan, a Transformer…

Computation and Language · Computer Science 2026-05-08 Reza T. Batley , Sourav Saha

We introduce a simple modification to the embedding layer. The key change is to infuse token embeddings with information about their spelling. Models trained with these embeddings improve not only on spelling, but also across standard…

Machine Learning · Computer Science 2026-01-27 Markus N. Rabe , Judith Clymo , Zheren Dong

Pipelined NLP systems have largely been superseded by end-to-end neural modeling, yet nearly all commonly-used models still require an explicit tokenization step. While recent tokenization approaches based on data-derived subword lexicons…

Computation and Language · Computer Science 2022-05-19 Jonathan H. Clark , Dan Garrette , Iulia Turc , John Wieting

We study a constrained training regime for decoder-only Transformers in which the token interface is fixed, previously trained dense blocks are not reopened, and the active trainable parameter set is kept approximately constant as depth…

Machine Learning · Computer Science 2026-05-05 A. Bochkov

Standard pretrained language models operate on sequences of subword tokens without direct access to the characters that compose each token's string representation. We probe the embedding layer of pretrained language models and show that…

Computation and Language · Computer Science 2022-06-09 Itay Itzhak , Omer Levy

Embedding layers in transformer-based NLP models typically account for the largest share of model parameters, scaling with vocabulary size but not yielding performance gains proportional to scale. We propose an alternative approach in which…

Computation and Language · Computer Science 2025-05-06 Henry Ndubuaku , Mouad Talhi

Commonly-used transformer language models depend on a tokenization schema which sets an unchangeable subword vocabulary prior to pre-training, destined to be applied to all downstream tasks regardless of domain shift, novel word formations,…

Computation and Language · Computer Science 2021-08-03 Yuval Pinter , Amanda Stent , Mark Dredze , Jacob Eisenstein

Token embeddings, a mapping from discrete lexical symbols to continuous vectors, are at the heart of any language model (LM). However, lexical symbol meanings can also be determined and even redefined by their structural role in a long…

Computation and Language · Computer Science 2023-05-29 Qian Huang , Eric Zelikman , Sarah Li Chen , Yuhuai Wu , Gregory Valiant , Percy Liang

Embedding matrices are key components in neural natural language processing (NLP) models that are responsible to provide numerical representations of input tokens.\footnote{In this paper words and subwords are referred to as \textit{tokens}…

Computation and Language · Computer Science 2022-04-19 Krtin Kumar , Peyman Passban , Mehdi Rezagholizadeh , Yiu Sing Lau , Qun Liu

Modern tokenizers employ deterministic algorithms to map text into a single "canonical" token sequence, yet the same string can be encoded as many non-canonical tokenizations using the tokenizer vocabulary. In this work, we investigate the…

Computation and Language · Computer Science 2026-02-04 Brian Siyuan Zheng , Alisa Liu , Orevaoghene Ahia , Jonathan Hayase , Yejin Choi , Noah A. Smith

Purely character-based language models (LMs) have been lagging in quality on large scale datasets, and current state-of-the-art LMs rely on word tokenization. It has been assumed that injecting the prior knowledge of a tokenizer into the…

Computation and Language · Computer Science 2019-08-28 Dokook Choe , Rami Al-Rfou , Mandy Guo , Heeyoung Lee , Noah Constant

Tabular neural network (NN) has attracted remarkable attentions and its recent advances have gradually narrowed the performance gap with respect to tree-based models on many public datasets. While the mainstreams focus on calibrating NN to…

Machine Learning · Computer Science 2024-03-05 Xuan Li , Yun Wang , Bo Li

We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusion language model is conditioned on the target…

Computation and Language · Computer Science 2026-02-19 Han Xiao

State-of-the-art language models are autoregressive and operate on subword units known as tokens. Specifically, one must encode the conditioning string into a list of tokens before passing to the language models for next-token prediction.…

Computation and Language · Computer Science 2024-07-09 Buu Phan , Marton Havasi , Matthew Muckley , Karen Ullrich
‹ Prev 1 2 3 10 Next ›