Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes
Abstract
Trainable input embedding tables are a standard component of modern language models. We ask whether they are actually necessary at the input interface. For a vocabulary of size , exact token identity requires only bits. We replace the usual trainable input embedding matrix with fixed minimal binary token codes and a zero-parameter lift to model width. In our main setting, , so , and tokens are represented by fixed 16-dimensional binary codes tiled to . We also evaluate a fully table-free variant in which codes are generated from token IDs on the fly and randomly recoded by an invertible affine transform over . Across matched 32-layer decoder-only models trained on approximately 17B tokens and evaluated over three independent training seeds, fixed minimal codes achieve comparable held-out validation perplexity to a standard learned-input baseline while removing 67.1M trainable input parameters. The fixed-code runs have a lower mean validation perplexity in our experiments, 2.36 versus 2.44, but the observed gap is within the measured seed-to-seed variation of 4.8\%; we therefore interpret the result as evidence that the trainable input table is not necessary, rather than as a statistically resolved superiority claim. The table-free affine-recoded variant remains close at 2.39 despite a slightly shorter training run. These results show that, in this regime, a trainable input embedding table is not necessary for useful language modeling. The output projection remains standard and trainable.
Keywords
Cite
@article{arxiv.2605.09751,
title = {Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes},
author = {A. Bochkov},
journal= {arXiv preprint arXiv:2605.09751},
year = {2026}
}