English

Language Model Inversion

Computation and Language 2023-11-27 v1 Machine Learning

Abstract

Language models produce a distribution over the next token; can we use this information to recover the prompt tokens? We consider the problem of language model inversion and show that next-token probabilities contain a surprising amount of information about the preceding text. Often we can recover the text in cases where it is hidden from the user, motivating a method for recovering unknown prompts given only the model's current distribution output. We consider a variety of model access scenarios, and show how even without predictions for every token in the vocabulary we can recover the probability vector through search. On Llama-2 7b, our inversion method reconstructs prompts with a BLEU of 5959 and token-level F1 of 7878 and recovers 27%27\% of prompts exactly. Code for reproducing all experiments is available at http://github.com/jxmorris12/vec2text.

Keywords

Cite

@article{arxiv.2311.13647,
  title  = {Language Model Inversion},
  author = {John X. Morris and Wenting Zhao and Justin T. Chiu and Vitaly Shmatikov and Alexander M. Rush},
  journal= {arXiv preprint arXiv:2311.13647},
  year   = {2023}
}
R2 v1 2026-06-28T13:28:57.958Z