Related papers: Progressive Autoregressive Video Diffusion Models

Self-Forcing++: Towards Minute-Scale High-Quality Video Generation

Diffusion models have revolutionized image and video generation, achieving unprecedented visual quality. However, their reliance on transformer architectures incurs prohibitively high computational costs, particularly when extending…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Justin Cui , Jie Wu , Ming Li , Tao Yang , Xiaojie Li , Rui Wang , Andrew Bai , Yuanhao Ban , Cho-Jui Hsieh

From Slow Bidirectional to Fast Autoregressive Video Diffusion Models

Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies. The generation of a single frame requires the model to process the entire sequence,…

Computer Vision and Pattern Recognition · Computer Science 2025-09-25 Tianwei Yin , Qiang Zhang , Richard Zhang , William T. Freeman , Fredo Durand , Eli Shechtman , Xun Huang

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes

Recent advances in diffusion models have improved controllable streetscape generation and supported downstream perception and planning tasks. However, challenges remain in accurately modeling driving scenes and generating long videos. To…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Jianbiao Mei , Tao Hu , Xuemeng Yang , Licheng Wen , Yu Yang , Tiantian Wei , Yukai Ma , Min Dou , Botian Shi , Yong Liu

Streaming Autoregressive Video Generation via Diagonal Distillation

Large pretrained diffusion models have significantly enhanced the quality of generated videos, and yet their use in real-time streaming remains limited. Autoregressive models offer a natural framework for sequential frame synthesis but…

Computer Vision and Pattern Recognition · Computer Science 2026-03-12 Jinxiu Liu , Xuanming Liu , Kangfu Mei , Yandong Wen , Ming-Hsuan Yang , Weiyang Liu

Long-Context Autoregressive Video Modeling with Next-Frame Prediction

Long-context video modeling is essential for enabling generative models to function as world simulators, as they must maintain temporal coherence over extended time spans. However, most existing models are trained on short clips, limiting…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Yuchao Gu , Weijia Mao , Mike Zheng Shou

ViD-GPT: Introducing GPT-style Autoregressive Generation in Video Diffusion Models

With the advance of diffusion models, today's video generation has achieved impressive quality. But generating temporal consistent long videos is still challenging. A majority of video diffusion models (VDMs) generate long videos in an…

Computer Vision and Pattern Recognition · Computer Science 2024-06-18 Kaifeng Gao , Jiaxin Shi , Hanwang Zhang , Chunping Wang , Jun Xiao

VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

Recent advances in video generation have been dominated by diffusion and flow-matching models, which produce high-quality results but remain computationally intensive and difficult to scale. In this work, we introduce VideoAR, the first…

Computer Vision and Pattern Recognition · Computer Science 2026-01-15 Longbin Ji , Xiaoxiong Liu , Junyuan Shang , Shuohuan Wang , Yu Sun , Hua Wu , Haifeng Wang

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling

With the availability of large-scale video datasets and the advances of diffusion models, text-driven video generation has achieved substantial progress. However, existing video generation models are typically trained on a limited number of…

Computer Vision and Pattern Recognition · Computer Science 2024-01-31 Haonan Qiu , Menghan Xia , Yong Zhang , Yingqing He , Xintao Wang , Ying Shan , Ziwei Liu

Playing with Transformer at 30+ FPS via Next-Frame Diffusion

Autoregressive video models offer distinct advantages over bidirectional diffusion models in creating interactive video content and supporting streaming applications with arbitrary duration. In this work, we present Next-Frame Diffusion…

Computer Vision and Pattern Recognition · Computer Science 2025-07-08 Xinle Cheng , Tianyu He , Jiayi Xu , Junliang Guo , Di He , Jiang Bian

Diffusion Probabilistic Modeling for Video Generation

Denoising diffusion probabilistic models are a promising new class of generative models that mark a milestone in high-quality image generation. This paper showcases their ability to sequentially generate video, surpassing prior methods in…

Computer Vision and Pattern Recognition · Computer Science 2022-12-09 Ruihan Yang , Prakhar Srivastava , Stephan Mandt

DiffVC: A Non-autoregressive Framework Based on Diffusion Model for Video Captioning

Current video captioning methods usually use an encoder-decoder structure to generate text autoregressively. However, autoregressive methods have inherent limitations such as slow generation speed and large cumulative error. Furthermore,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Junbo Wang , Liangyu Fu , Yuke Li , Yining Zhu , Ya Jing , Xuecheng Wu , Jiangbin Zheng

AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion

The task of video generation requires synthesizing visually realistic and temporally coherent video frames. Existing methods primarily use asynchronous auto-regressive models or synchronous diffusion models to address this challenge.…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Mingzhen Sun , Weining Wang , Gen Li , Jiawei Liu , Jiahui Sun , Wanquan Feng , Shanshan Lao , SiYu Zhou , Qian He , Jing Liu

Rolling Forcing: Autoregressive Long Video Diffusion in Real Time

Streaming video generation, as one fundamental component in interactive world models and neural game engines, aims to generate high-quality, low-latency, and temporally coherent long video streams. However, most existing work suffers from…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Kunhao Liu , Wenbo Hu , Jiale Xu , Ying Shan , Shijian Lu

VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data…

Computer Vision and Pattern Recognition · Computer Science 2023-10-16 Zhengxiong Luo , Dayou Chen , Yingya Zhang , Yan Huang , Liang Wang , Yujun Shen , Deli Zhao , Jingren Zhou , Tieniu Tan

Loong: Generating Minute-level Long Videos with Autoregressive Language Models

It is desirable but challenging to generate content-rich long videos in the scale of minutes. Autoregressive large language models (LLMs) have achieved great success in generating coherent and long sequences of tokens in the domain of…

Computer Vision and Pattern Recognition · Computer Science 2025-04-03 Yuqing Wang , Tianwei Xiong , Daquan Zhou , Zhijie Lin , Yang Zhao , Bingyi Kang , Jiashi Feng , Xihui Liu

Efficient Autoregressive Video Diffusion with Dummy Head

The autoregressive video diffusion model has recently gained considerable research interest due to its causal modeling and iterative denoising. In this work, we identify that the multi-head self-attention in these models under-utilizes…

Computer Vision and Pattern Recognition · Computer Science 2026-01-29 Hang Guo , Zhaoyang Jia , Jiahao Li , Bin Li , Yuanhao Cai , Jiangshan Wang , Yawei Li , Yan Lu

Frame-Level Captions for Long Video Generation with Complex Multi Scenes

Generating long videos that can show complex stories, like movie scenes from scripts, has great promise and offers much more than short clips. However, current methods that use autoregression with diffusion models often struggle because…

Computer Vision and Pattern Recognition · Computer Science 2025-05-28 Guangcong Zheng , Jianlong Yuan , Bo Wang , Haoyang Huang , Guoqing Ma , Nan Duan

Latent Video Diffusion Models for High-Fidelity Long Video Generation

AI-generated content has attracted lots of attention recently, but photo-realistic video synthesis is still challenging. Although many attempts using GANs and autoregressive models have been made in this area, the visual quality and length…

Computer Vision and Pattern Recognition · Computer Science 2023-03-21 Yingqing He , Tianyu Yang , Yong Zhang , Ying Shan , Qifeng Chen

DiffuseSlide: Training-Free High Frame Rate Video Generation Diffusion

Recent advancements in diffusion models have revolutionized video generation, enabling the creation of high-quality, temporally consistent videos. However, generating high frame-rate (FPS) videos remains a significant challenge due to…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Geunmin Hwang , Hyun-kyu Ko , Younghyun Kim , Seungryong Lee , Eunbyung Park

Video-Infinity: Distributed Long Video Generation

Diffusion models have recently achieved remarkable results for video generation. Despite the encouraging performances, the generated videos are typically constrained to a small number of frames, resulting in clips lasting merely a few…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Zhenxiong Tan , Xingyi Yang , Songhua Liu , Xinchao Wang