Related papers: Embedding Inversion via Conditional Masked Diffusi…

Mask-Predict: Parallel Decoding of Conditional Masked Language Models

Most machine translation systems generate text autoregressively from left to right. We, instead, use a masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a…

Computation and Language · Computer Science 2019-09-05 Marjan Ghazvininejad , Omer Levy , Yinhan Liu , Luke Zettlemoyer

Soft-Masked Diffusion Language Models

Diffusion models have demonstrated strong potential in language modeling, offering various advantages over traditional autoregressive approaches. Their ability to generate and revise entire responses in parallel enables faster generation…

Machine Learning · Computer Science 2026-03-03 Michael Hersche , Samuel Moor-Smith , Thomas Hofmann , Abbas Rahimi

BeamClean: Language Aware Embedding Reconstruction

In this work, we consider an inversion attack on the obfuscated input embeddings sent to a language model on a server, where the adversary has no access to the language model or the obfuscation mechanism and sees only the obfuscated…

Cryptography and Security · Computer Science 2025-05-21 Kaan Kale , Kyle Mylonakis , Jay Roberts , Sidhartha Roy

Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism

Non-autoregressive models generate target words in a parallel way, which achieve a faster decoding speed but at the sacrifice of translation accuracy. To remedy a flawed translation by non-autoregressive models, a promising approach is to…

Computation and Language · Computer Science 2020-10-27 Pan Xie , Zhi Cui , Xiuyin Chen , Xiaohui Hu , Jianwei Cui , Bin Wang

MaskInversion: Localized Embeddings via Optimization of Explainability Maps

Vision-language foundation models such as CLIP have achieved tremendous results in global vision-language alignment, but still show some limitations in creating representations for specific image regions. % To address this problem, we…

Computer Vision and Pattern Recognition · Computer Science 2026-02-16 Walid Bousselham , Sofian Chaybouti , Christian Rupprecht , Vittorio Ferrari , Hilde Kuehne

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Autoregressive models (ARMs) are hindered by slow sequential inference. While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computational overhead from precluding Key-Value (KV)…

Computation and Language · Computer Science 2026-03-06 Jia-Nan Li , Jian Guan , Wei Wu , Chongxuan Li

Self-conditioned Embedding Diffusion for Text Generation

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as…

Computation and Language · Computer Science 2022-11-09 Robin Strudel , Corentin Tallec , Florent Altché , Yilun Du , Yaroslav Ganin , Arthur Mensch , Will Grathwohl , Nikolay Savinov , Sander Dieleman , Laurent Sifre , Rémi Leblond

Corrective Diffusion Language Models

While Diffusion Language Models (DLMs) are theoretically well-suited for iterative refinement due to their non-causal structure, they often fail to reliably revise incorrect tokens in practice. The key challenge lies in the model's…

Machine Learning · Computer Science 2026-01-30 Shuibai Zhang , Fred Zhangzhi Peng , Yiheng Zhang , Jin Pan , Grigorios G. Chrysos

Reversible Diffusion Decoding for Diffusion Language Models

Diffusion language models enable parallel token generation through block-wise decoding, but their irreversible commitments can lead to stagnation, where the reverse diffusion process fails to make further progress under a suboptimal…

Computation and Language · Computer Science 2026-02-03 Xinyun Wang , Min Zhang , Sen Cui , Zhikang Chen , Bo Jiang , Kun Kuang , Mingbao Lin

Fast Training of Diffusion Models with Masked Transformers

We propose an efficient approach to train large diffusion models with masked transformers. While masked transformers have been extensively explored for representation learning, their application to generative learning is less explored in…

Computer Vision and Pattern Recognition · Computer Science 2024-03-06 Hongkai Zheng , Weili Nie , Arash Vahdat , Anima Anandkumar

TransFusion: Transcribing Speech with Multinomial Diffusion

Diffusion models have shown exceptional scaling properties in the image synthesis domain, and initial attempts have shown similar benefits for applying diffusion to unconditional text synthesis. Denoising diffusion models attempt to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-17 Matthew Baas , Kevin Eloff , Herman Kamper

Non-Autoregressive Machine Translation with Disentangled Context Transformer

State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens. The sequential nature of this generation process causes fundamental latency in…

Computation and Language · Computer Science 2020-07-01 Jungo Kasai , James Cross , Marjan Ghazvininejad , Jiatao Gu

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over…

Artificial Intelligence · Computer Science 2024-08-21 Chunting Zhou , Lili Yu , Arun Babu , Kushal Tirumala , Michihiro Yasunaga , Leonid Shamis , Jacob Kahn , Xuezhe Ma , Luke Zettlemoyer , Omer Levy

Simplified and Generalized Masked Diffusion for Discrete Data

Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data. However, existing work in this area has been hindered by unnecessarily complex model formulations and…

Machine Learning · Computer Science 2025-01-17 Jiaxin Shi , Kehang Han , Zhe Wang , Arnaud Doucet , Michalis K. Titsias

Just on Time: Token-Level Early Stopping for Diffusion Language Models

Diffusion language models generate text through iterative refinement, a process that is often computationally inefficient because many tokens reach stability long before the final denoising step. We introduce a training-free, token-level…

Machine Learning · Computer Science 2026-02-12 Zahar Kohut , Severyn Shykula , Dmytro Khamula , Mykola Vysotskyi , Taras Rumezhak , Volodymyr Karpiv

Window-Diffusion: Accelerating Diffusion Language Model Inference with Windowed Token Pruning and Caching

Diffusion language models (DLMs) generate text through iterative denoising, but inference requires full-sequence attention at every iteration, resulting in substantial redundant computation on masked tokens. Block-wise diffusion can reduce…

Machine Learning · Computer Science 2026-02-03 Fengrui Zuo , Zhiwei Ke , Yiming Liu , Wenqi Lou , Chao Wang , Xuehai Zhou

Learn from Your Mistakes: Self-Correcting Masked Diffusion Models

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models, enabling parallel token generation while achieving competitive performance. Despite these advantages, MDMs face a fundamental limitation: once…

Machine Learning · Computer Science 2026-03-06 Yair Schiff , Omer Belhasin , Roy Uziel , Guanghan Wang , Marianne Arriola , Gilad Turok , Michael Elad , Volodymyr Kuleshov

Remasking Discrete Diffusion Models with Inference-Time Scaling

Part of the success of diffusion models stems from their ability to perform iterative refinement, i.e., repeatedly correcting outputs during generation. However, modern masked discrete diffusion lacks this capability: when a token is…

Machine Learning · Computer Science 2026-02-10 Guanghan Wang , Yair Schiff , Subham Sekhar Sahoo , Volodymyr Kuleshov

Edit-Based Refinement for Parallel Masked Diffusion Language Models

Masked diffusion language models enable parallel token generation and offer improved decoding efficiency over autoregressive models. However, their performance degrades significantly when generating multiple tokens simultaneously, due to a…

Computation and Language · Computer Science 2026-05-12 Houxing Ren , Mingjie Zhan , Zimu Lu , Ke Wang , Yunqiao Yang , Haotian Hou , Junting Pan , Hongsheng Li

Full waveform inversion method based on diffusion model

Seismic full-waveform inversion is a core technology for obtaining high-resolution subsurface model parameters. However, its highly nonlinear characteristics and strong dependence on the initial model often lead to the inversion process…

Machine Learning · Computer Science 2026-03-25 Caiyun Liu , Siyang Pei , Qingfeng Yu , Jie Xiong