Related papers: Diffusion Forcing: Next-token Prediction Meets Ful…

Foresight Diffusion: Improving Sampling Consistency in Predictive Diffusion Models

Diffusion and flow-based models have enabled significant progress in generation tasks across various modalities and have recently found applications in predictive learning. However, unlike typical generation tasks that encourage sample…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Yu Zhang , Xingzhuo Guo , Haoran Xu , Jialong Wu , Mingsheng Long

Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models

Diffusion models have emerged as powerful generative frameworks by progressively adding noise to data through a forward process and then reversing this process to generate realistic samples. While these models have achieved strong…

Machine Learning · Computer Science 2025-03-04 Xingzhuo Guo , Yu Zhang , Baixu Chen , Haoran Xu , Jianmin Wang , Mingsheng Long

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over…

Artificial Intelligence · Computer Science 2024-08-21 Chunting Zhou , Lili Yu , Arun Babu , Kushal Tirumala , Michihiro Yasunaga , Leonid Shamis , Jacob Kahn , Xuezhe Ma , Luke Zettlemoyer , Omer Levy

Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes

Diffusion models have emerged as a promising approach for text generation, with recent works falling into two main categories: discrete and continuous diffusion models. Discrete diffusion models apply token corruption independently using…

Computation and Language · Computer Science 2025-05-29 Bocheng Li , Zhujin Gao , Linli Xu

Computational Tradeoffs in Image Synthesis: Diffusion, Masked-Token, and Next-Token Prediction

Nearly every recent image synthesis approach, including diffusion, masked-token prediction, and next-token prediction, uses a Transformer network architecture. Despite this common backbone, there has been no direct, compute controlled…

Computer Vision and Pattern Recognition · Computer Science 2024-05-27 Maciej Kilian , Varun Jampani , Luke Zettlemoyer

Training Diffusion Models with Reinforcement Learning

Diffusion models are a class of flexible generative models trained with an approximation to the log-likelihood objective. However, most use cases of diffusion models are not concerned with likelihoods, but instead with downstream objectives…

Machine Learning · Computer Science 2024-01-08 Kevin Black , Michael Janner , Yilun Du , Ilya Kostrikov , Sergey Levine

ForeDiffusion: Foresight-Conditioned Diffusion Policy via Future View Construction for Robot Manipulation

Diffusion strategies have advanced visual motor control by progressively denoising high-dimensional action sequences, providing a promising method for robot manipulation. However, as task complexity increases, the success rate of existing…

Robotics · Computer Science 2026-01-21 Weize Xie , Yi Ding , Ying He , Leilei Wang , Binwen Bai , Zheyi Zhao , Chenyang Wang , F. Richard Yu

ProtoDiffusion: Classifier-Free Diffusion Guidance with Prototype Learning

Diffusion models are generative models that have shown significant advantages compared to other generative models in terms of higher generation quality and more stable training. However, the computational need for training diffusion models…

Computer Vision and Pattern Recognition · Computer Science 2023-07-06 Gulcin Baykal , Halil Faruk Karagoz , Taha Binhuraib , Gozde Unal

Reasoning with Latent Tokens in Diffusion Language Models

Discrete diffusion models have recently become competitive with autoregressive models for language modeling, even outperforming them on reasoning tasks requiring planning and global coherence, but they require more computation at inference…

Machine Learning · Computer Science 2026-02-04 Andre He , Sean Welleck , Daniel Fried

DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers

Diffusion models have demonstrated remarkable success in various image generation tasks, but their performance is often limited by the uniform processing of inputs across varying conditions and noise levels. To address this limitation, we…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Minglei Shi , Ziyang Yuan , Haotian Yang , Xintao Wang , Mingwu Zheng , Xin Tao , Wenliang Zhao , Wenzhao Zheng , Jie Zhou , Jiwen Lu , Pengfei Wan , Di Zhang , Kun Gai

An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization

Diffusion models, a powerful and universal generative AI technology, have achieved tremendous success in computer vision, audio, reinforcement learning, and computational biology. In these applications, diffusion models provide flexible…

Machine Learning · Computer Science 2024-04-12 Minshuo Chen , Song Mei , Jianqing Fan , Mengdi Wang

TransFusion: Transcribing Speech with Multinomial Diffusion

Diffusion models have shown exceptional scaling properties in the image synthesis domain, and initial attempts have shown similar benefits for applying diffusion to unconditional text synthesis. Denoising diffusion models attempt to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-17 Matthew Baas , Kevin Eloff , Herman Kamper

Diffusion Predictive Control with Constraints

Diffusion models have become popular for policy learning in robotics due to their ability to capture high-dimensional and multimodal distributions. However, diffusion policies are stochastic and typically trained offline, limiting their…

Robotics · Computer Science 2025-05-28 Ralf Römer , Alexander von Rohr , Angela P. Schoellig

Thompson Sampling with Diffusion Generative Prior

In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs…

Machine Learning · Computer Science 2023-01-31 Yu-Guan Hsieh , Shiva Prasad Kasiviswanathan , Branislav Kveton , Patrick Blöbaum

Learning To Sample From Diffusion Models Via Inverse Reinforcement Learning

Diffusion models generate samples through an iterative denoising process, guided by a neural network. While training the denoiser on real-world data is computationally demanding, the sampling procedure itself is more flexible. This…

Machine Learning · Computer Science 2026-02-10 Constant Bourdrez , Alexandre Vérine , Olivier Cappé

Diffusion Models for Time Series Applications: A Survey

Diffusion models, a family of generative models based on deep learning, have become increasingly prominent in cutting-edge machine learning research. With a distinguished performance in generating samples that resemble the observed data,…

Machine Learning · Computer Science 2023-05-02 Lequan Lin , Zhengkun Li , Ruikun Li , Xuliang Li , Junbin Gao

Rethinking Token Prediction: Tree-Structured Diffusion Language Model

Discrete diffusion language models have emerged as a competitive alternative to auto-regressive language models, but training them efficiently under limited parameter and memory budgets remains challenging. Modern architectures are…

Computation and Language · Computer Science 2026-04-07 Zihao Wu , Haoming Yang , Juncheng Dong , Vahid Tarokh

Just on Time: Token-Level Early Stopping for Diffusion Language Models

Diffusion language models generate text through iterative refinement, a process that is often computationally inefficient because many tokens reach stability long before the final denoising step. We introduce a training-free, token-level…

Machine Learning · Computer Science 2026-02-12 Zahar Kohut , Severyn Shykula , Dmytro Khamula , Mykola Vysotskyi , Taras Rumezhak , Volodymyr Karpiv

FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation

We present FloodDiffusion, a new framework for text-driven, streaming human motion generation. Given time-varying text prompts, FloodDiffusion generates text-aligned, seamless motion sequences with real-time latency. Unlike existing methods…

Computer Vision and Pattern Recognition · Computer Science 2026-02-09 Yiyi Cai , Yuhan Wu , Kunhang Li , You Zhou , Bo Zheng , Haiyang Liu

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion

We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models. It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Xun Huang , Zhengqi Li , Guande He , Mingyuan Zhou , Eli Shechtman