English

Embedding Inversion via Conditional Masked Diffusion Language Models

Computation and Language 2026-02-19 v3

Abstract

We frame embedding inversion as conditional masked diffusion, recovering all tokens in parallel through iterative denoising rather than sequential autoregressive generation. A masked diffusion language model is conditioned on the target embedding via adaptive layer normalization, requiring only 8 forward passes with no access to the target encoder at inference time. On 32-token sequences across three embedding models, the method achieves token recovery through parallel denoising without requiring encoder access, iterative correction, or architecture-specific alignment. Source code and live demo are available at https://github.com/jina-ai/embedding-inversion-demo.

Keywords

Cite

@article{arxiv.2602.11047,
  title  = {Embedding Inversion via Conditional Masked Diffusion Language Models},
  author = {Han Xiao},
  journal= {arXiv preprint arXiv:2602.11047},
  year   = {2026}
}

Comments

8 pages, 3 figures, 4 tables. Code and demo: https://github.com/jina-ai/embedding-inversion-demo

R2 v1 2026-07-01T10:32:12.234Z