Related papers: Causal Diffusion Transformers for Generative Model…

D-AR: Diffusion via Autoregressive Models

This paper presents Diffusion via Autoregressive models (D-AR), a new paradigm recasting the image diffusion process as a vanilla autoregressive procedure in the standard next-token-prediction fashion. We start by designing the tokenizer…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Ziteng Gao , Mike Zheng Shou

MADFormer: Mixed Autoregressive and Diffusion Transformers for Continuous Image Generation

Recent progress in multimodal generation has increasingly combined autoregressive (AR) and diffusion-based approaches, leveraging their complementary strengths: AR models capture long-range dependencies and produce fluent, context-aware…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Junhao Chen , Yulia Tsvetkov , Xiaochuang Han

Next Tokens Denoising for Speech Synthesis

While diffusion and autoregressive (AR) models have significantly advanced generative modeling, they each present distinct limitations. AR models, which rely on causal attention, cannot exploit future context and suffer from slow generation…

Sound · Computer Science 2025-08-04 Yanqing Liu , Ruiqing Xue , Chong Zhang , Yufei Liu , Gang Wang , Bohan Li , Yao Qian , Lei He , Shujie Liu , Sheng Zhao

CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation

Training data has been proven to be one of the most critical components in training generative AI. However, obtaining high-quality data remains challenging, with data privacy issues presenting a significant hurdle. To address the need for…

Computation and Language · Computer Science 2025-06-18 Jia-Chen Zhang , Zheng Zhou , Yu-Jie Xiong , Chun-Ming Xia , Fei Dai

Causal Motion Diffusion Models for Autoregressive Motion Generation

Recent advances in motion diffusion models have substantially improved the realism of human motion synthesis. However, existing approaches either rely on full-sequence diffusion models with bidirectional generation, which limits temporal…

Computer Vision and Pattern Recognition · Computer Science 2026-02-27 Qing Yu , Akihisa Watanabe , Kent Fujiwara

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. We apply Diffusion Forcing to sequence generative modeling by training a…

Machine Learning · Computer Science 2024-12-11 Boyuan Chen , Diego Marti Monso , Yilun Du , Max Simchowitz , Russ Tedrake , Vincent Sitzmann

Revisiting Autoregressive Models for Generative Image Classification

Class-conditional generative models have emerged as accurate and robust classifiers, with diffusion models demonstrating clear advantages over other visual generative paradigms, including autoregressive (AR) models. In this work, we revisit…

Computer Vision and Pattern Recognition · Computer Science 2026-03-20 Ilia Sudakov , Artem Babenko , Dmitry Baranchuk

Transition Matching: Scalable and Flexible Generative Modeling

Diffusion and flow matching models have significantly advanced media generation, yet their design space is well-explored, somewhat limiting further improvements. Concurrently, autoregressive (AR) models, particularly those generating…

Machine Learning · Computer Science 2025-07-01 Neta Shaul , Uriel Singer , Itai Gat , Yaron Lipman

DiffCap: Exploring Continuous Diffusion on Image Captioning

Current image captioning works usually focus on generating descriptions in an autoregressive manner. However, there are limited works that focus on generating descriptions non-autoregressively, which brings more decoding diversity. Inspired…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Yufeng He , Zefan Cai , Xu Gan , Baobao Chang

From Prediction to Perfection: Introducing Refinement to Autoregressive Image Generation

Autoregressive (AR) image generators offer a language-model-friendly approach to image generation by predicting discrete image tokens in a causal sequence. However, unlike diffusion models, AR models lack a mechanism to refine previous…

Computer Vision and Pattern Recognition · Computer Science 2026-01-29 Cheng Cheng , Lin Song , Di An , Yicheng Xiao , Xuchong Zhang , Hongbin Sun , Ying Shan

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently.…

Computation and Language · Computer Science 2023-12-14 Tong Wu , Zhihao Fan , Xiao Liu , Yeyun Gong , Yelong Shen , Jian Jiao , Hai-Tao Zheng , Juntao Li , Zhongyu Wei , Jian Guo , Nan Duan , Weizhu Chen

Causal Autoregressive Diffusion Language Model

In this work, we propose Causal Autoregressive Diffusion (CARD), a novel framework that unifies the training efficiency of ARMs with the high-throughput inference of diffusion models. CARD reformulates the diffusion process within a…

Computation and Language · Computer Science 2026-01-30 Junhao Ruan , Bei Li , Yongjing Yin , Pengcheng Huang , Xin Chen , Jingang Wang , Xunliang Cai , Tong Xiao , JingBo Zhu

Causal Diffusion Autoencoders: Toward Counterfactual Generation via Diffusion Probabilistic Models

Diffusion probabilistic models (DPMs) have become the state-of-the-art in high-quality image generation. However, DPMs have an arbitrary noisy latent space with no interpretable or controllable semantics. Although there has been significant…

Machine Learning · Computer Science 2024-08-27 Aneesh Komanduri , Chen Zhao , Feng Chen , Xintao Wu

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Autoregressive models (ARMs) are hindered by slow sequential inference. While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computational overhead from precluding Key-Value (KV)…

Computation and Language · Computer Science 2026-03-06 Jia-Nan Li , Jian Guan , Wei Wu , Chongxuan Li

Autoregressive Image Generation without Vector Quantization

Conventional wisdom holds that autoregressive models for image generation are typically accompanied by vector-quantized tokens. We observe that while a discrete-valued space can facilitate representing a categorical distribution, it is not…

Computer Vision and Pattern Recognition · Computer Science 2024-11-04 Tianhong Li , Yonglong Tian , He Li , Mingyang Deng , Kaiming He

From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

Diffusion models promise efficient parallel text generation but rely on bidirectional attention, creating a structural mismatch with pre-trained Autoregressive (AR) models. This incompatibility precludes reusing robust AR priors,…

Computation and Language · Computer Science 2026-05-29 Xiangyu Ma , Teng Xiao , Zuchao Li , Lefei Zhang

Dolfin: Diffusion Layout Transformers without Autoencoder

In this paper, we introduce a novel generative model, Diffusion Layout Transformers without Autoencoder (Dolfin), which significantly improves the modeling capability with reduced complexity compared to existing methods. Dolfin employs a…

Computer Vision and Pattern Recognition · Computer Science 2023-10-26 Yilin Wang , Zeyuan Chen , Liangjun Zhong , Zheng Ding , Zhizhou Sha , Zhuowen Tu

Computer-Aided Design Generation by Cascaded Discrete Diffusion Model

Recent deep learning approaches seek to automate CAD creation by representing a model as a sequence of discrete commands and parameters, and then generating them using autoregressive models or continuous diffusion operating in Euclidean…

Computer Vision and Pattern Recognition · Computer Science 2026-05-07 Honghu Pan , Xiaoling Luo , Yongyong Chen , Zhenyu He , Pengyang Wang

Proxy-Tuning: Tailoring Multimodal Autoregressive Models for Subject-Driven Image Generation

Multimodal autoregressive (AR) models, based on next-token prediction and transformer architecture, have demonstrated remarkable capabilities in various multimodal tasks including text-to-image (T2I) generation. Despite their strong…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Yi Wu , Shengju Qian , Lingting Zhu , Lei Liu , Wandi Qiao , Ziqiang Li , Lequan Yu , Bin Li

Diffusion-Free Graph Generation with Next-Scale Prediction

Autoregressive models excel in efficiency and plug directly into the transformer ecosystem, delivering robust generalization, predictable scalability, and seamless workflows such as fine-tuning and parallelized training. However, they…

Machine Learning · Computer Science 2025-06-13 Samuel Belkadi , Steve Hong , Marian Chen , Miruna Cretu , Charles Harris , Pietro Lio