Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes

A. Bochkov

Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes

Computation and Language 2026-05-12 v1

Authors: A. Bochkov

Abstract

Trainable input embedding tables are a standard component of modern language models. We ask whether they are actually necessary at the input interface. For a vocabulary of size $V$ , exact token identity requires only $K=\lceil \log_2 V\rceil$ bits. We replace the usual trainable $V\times d_{\text{model}}$ input embedding matrix with fixed minimal binary token codes and a zero-parameter lift to model width. In our main setting, $V=65{,}536$ , so $K=16$ , and tokens are represented by fixed 16-dimensional binary codes tiled to $d_{\text{model}}=1024$ . We also evaluate a fully table-free variant in which codes are generated from token IDs on the fly and randomly recoded by an invertible affine transform over $\mathbb{F}_2^K$ . Across matched 32-layer decoder-only models trained on approximately 17B tokens and evaluated over three independent training seeds, fixed minimal codes achieve comparable held-out validation perplexity to a standard learned-input baseline while removing 67.1M trainable input parameters. The fixed-code runs have a lower mean validation perplexity in our experiments, 2.36 versus 2.44, but the observed gap is within the measured seed-to-seed variation of 4.8\%; we therefore interpret the result as evidence that the trainable input table is not necessary, rather than as a statistically resolved superiority claim. The table-free affine-recoded variant remains close at 2.39 despite a slightly shorter training run. These results show that, in this regime, a trainable input embedding table is not necessary for useful language modeling. The output projection remains standard and trainable.

Keywords

tokenization transformer

Cite

@article{arxiv.2605.09751,
  title  = {Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes},
  author = {A. Bochkov},
  journal= {arXiv preprint arXiv:2605.09751},
  year   = {2026}
}

Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes

Abstract

Keywords

Cite

Related papers