English
Related papers

Related papers: Causal Diffusion Transformers for Generative Model…

200 papers

This paper presents Diffusion via Autoregressive models (D-AR), a new paradigm recasting the image diffusion process as a vanilla autoregressive procedure in the standard next-token-prediction fashion. We start by designing the tokenizer…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Ziteng Gao , Mike Zheng Shou

Recent progress in multimodal generation has increasingly combined autoregressive (AR) and diffusion-based approaches, leveraging their complementary strengths: AR models capture long-range dependencies and produce fluent, context-aware…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Junhao Chen , Yulia Tsvetkov , Xiaochuang Han

While diffusion and autoregressive (AR) models have significantly advanced generative modeling, they each present distinct limitations. AR models, which rely on causal attention, cannot exploit future context and suffer from slow generation…

Sound · Computer Science 2025-08-04 Yanqing Liu , Ruiqing Xue , Chong Zhang , Yufei Liu , Gang Wang , Bohan Li , Yao Qian , Lei He , Shujie Liu , Sheng Zhao

Training data has been proven to be one of the most critical components in training generative AI. However, obtaining high-quality data remains challenging, with data privacy issues presenting a significant hurdle. To address the need for…

Computation and Language · Computer Science 2025-06-18 Jia-Chen Zhang , Zheng Zhou , Yu-Jie Xiong , Chun-Ming Xia , Fei Dai

Recent advances in motion diffusion models have substantially improved the realism of human motion synthesis. However, existing approaches either rely on full-sequence diffusion models with bidirectional generation, which limits temporal…

Computer Vision and Pattern Recognition · Computer Science 2026-02-27 Qing Yu , Akihisa Watanabe , Kent Fujiwara

This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. We apply Diffusion Forcing to sequence generative modeling by training a…

Machine Learning · Computer Science 2024-12-11 Boyuan Chen , Diego Marti Monso , Yilun Du , Max Simchowitz , Russ Tedrake , Vincent Sitzmann

Class-conditional generative models have emerged as accurate and robust classifiers, with diffusion models demonstrating clear advantages over other visual generative paradigms, including autoregressive (AR) models. In this work, we revisit…

Computer Vision and Pattern Recognition · Computer Science 2026-03-20 Ilia Sudakov , Artem Babenko , Dmitry Baranchuk

Diffusion and flow matching models have significantly advanced media generation, yet their design space is well-explored, somewhat limiting further improvements. Concurrently, autoregressive (AR) models, particularly those generating…

Machine Learning · Computer Science 2025-07-01 Neta Shaul , Uriel Singer , Itai Gat , Yaron Lipman

Current image captioning works usually focus on generating descriptions in an autoregressive manner. However, there are limited works that focus on generating descriptions non-autoregressively, which brings more decoding diversity. Inspired…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Yufeng He , Zefan Cai , Xu Gan , Baobao Chang

Autoregressive (AR) image generators offer a language-model-friendly approach to image generation by predicting discrete image tokens in a causal sequence. However, unlike diffusion models, AR models lack a mechanism to refine previous…

Computer Vision and Pattern Recognition · Computer Science 2026-01-29 Cheng Cheng , Lin Song , Di An , Yicheng Xiao , Xuchong Zhang , Hongbin Sun , Ying Shan

Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance. Their success has been recently expanded to text generation via generating all tokens within a sequence concurrently.…

Computation and Language · Computer Science 2023-12-14 Tong Wu , Zhihao Fan , Xiao Liu , Yeyun Gong , Yelong Shen , Jian Jiao , Hai-Tao Zheng , Juntao Li , Zhongyu Wei , Jian Guo , Nan Duan , Weizhu Chen

In this work, we propose Causal Autoregressive Diffusion (CARD), a novel framework that unifies the training efficiency of ARMs with the high-throughput inference of diffusion models. CARD reformulates the diffusion process within a…

Computation and Language · Computer Science 2026-01-30 Junhao Ruan , Bei Li , Yongjing Yin , Pengcheng Huang , Xin Chen , Jingang Wang , Xunliang Cai , Tong Xiao , JingBo Zhu

Diffusion probabilistic models (DPMs) have become the state-of-the-art in high-quality image generation. However, DPMs have an arbitrary noisy latent space with no interpretable or controllable semantics. Although there has been significant…

Machine Learning · Computer Science 2024-08-27 Aneesh Komanduri , Chen Zhao , Feng Chen , Xintao Wu

Autoregressive models (ARMs) are hindered by slow sequential inference. While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computational overhead from precluding Key-Value (KV)…

Computation and Language · Computer Science 2026-03-06 Jia-Nan Li , Jian Guan , Wei Wu , Chongxuan Li

Conventional wisdom holds that autoregressive models for image generation are typically accompanied by vector-quantized tokens. We observe that while a discrete-valued space can facilitate representing a categorical distribution, it is not…

Computer Vision and Pattern Recognition · Computer Science 2024-11-04 Tianhong Li , Yonglong Tian , He Li , Mingyang Deng , Kaiming He

Diffusion models promise efficient parallel text generation but rely on bidirectional attention, creating a structural mismatch with pre-trained Autoregressive (AR) models. This incompatibility precludes reusing robust AR priors,…

Computation and Language · Computer Science 2026-05-29 Xiangyu Ma , Teng Xiao , Zuchao Li , Lefei Zhang

In this paper, we introduce a novel generative model, Diffusion Layout Transformers without Autoencoder (Dolfin), which significantly improves the modeling capability with reduced complexity compared to existing methods. Dolfin employs a…

Computer Vision and Pattern Recognition · Computer Science 2023-10-26 Yilin Wang , Zeyuan Chen , Liangjun Zhong , Zheng Ding , Zhizhou Sha , Zhuowen Tu

Recent deep learning approaches seek to automate CAD creation by representing a model as a sequence of discrete commands and parameters, and then generating them using autoregressive models or continuous diffusion operating in Euclidean…

Computer Vision and Pattern Recognition · Computer Science 2026-05-07 Honghu Pan , Xiaoling Luo , Yongyong Chen , Zhenyu He , Pengyang Wang

Multimodal autoregressive (AR) models, based on next-token prediction and transformer architecture, have demonstrated remarkable capabilities in various multimodal tasks including text-to-image (T2I) generation. Despite their strong…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Yi Wu , Shengju Qian , Lingting Zhu , Lei Liu , Wandi Qiao , Ziqiang Li , Lequan Yu , Bin Li

Autoregressive models excel in efficiency and plug directly into the transformer ecosystem, delivering robust generalization, predictable scalability, and seamless workflows such as fine-tuning and parallelized training. However, they…

Machine Learning · Computer Science 2025-06-13 Samuel Belkadi , Steve Hong , Marian Chen , Miruna Cretu , Charles Harris , Pietro Lio
‹ Prev 1 2 3 10 Next ›