English
Related papers

Related papers: Progressive Autoregressive Video Diffusion Models

200 papers

Diffusion models have revolutionized image and video generation, achieving unprecedented visual quality. However, their reliance on transformer architectures incurs prohibitively high computational costs, particularly when extending…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Justin Cui , Jie Wu , Ming Li , Tao Yang , Xiaojie Li , Rui Wang , Andrew Bai , Yuanhao Ban , Cho-Jui Hsieh

Current video diffusion models achieve impressive generation quality but struggle in interactive applications due to bidirectional attention dependencies. The generation of a single frame requires the model to process the entire sequence,…

Computer Vision and Pattern Recognition · Computer Science 2025-09-25 Tianwei Yin , Qiang Zhang , Richard Zhang , William T. Freeman , Fredo Durand , Eli Shechtman , Xun Huang

Recent advances in diffusion models have improved controllable streetscape generation and supported downstream perception and planning tasks. However, challenges remain in accurately modeling driving scenes and generating long videos. To…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Jianbiao Mei , Tao Hu , Xuemeng Yang , Licheng Wen , Yu Yang , Tiantian Wei , Yukai Ma , Min Dou , Botian Shi , Yong Liu

Large pretrained diffusion models have significantly enhanced the quality of generated videos, and yet their use in real-time streaming remains limited. Autoregressive models offer a natural framework for sequential frame synthesis but…

Computer Vision and Pattern Recognition · Computer Science 2026-03-12 Jinxiu Liu , Xuanming Liu , Kangfu Mei , Yandong Wen , Ming-Hsuan Yang , Weiyang Liu

Long-context video modeling is essential for enabling generative models to function as world simulators, as they must maintain temporal coherence over extended time spans. However, most existing models are trained on short clips, limiting…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Yuchao Gu , Weijia Mao , Mike Zheng Shou

With the advance of diffusion models, today's video generation has achieved impressive quality. But generating temporal consistent long videos is still challenging. A majority of video diffusion models (VDMs) generate long videos in an…

Computer Vision and Pattern Recognition · Computer Science 2024-06-18 Kaifeng Gao , Jiaxin Shi , Hanwang Zhang , Chunping Wang , Jun Xiao

Recent advances in video generation have been dominated by diffusion and flow-matching models, which produce high-quality results but remain computationally intensive and difficult to scale. In this work, we introduce VideoAR, the first…

Computer Vision and Pattern Recognition · Computer Science 2026-01-15 Longbin Ji , Xiaoxiong Liu , Junyuan Shang , Shuohuan Wang , Yu Sun , Hua Wu , Haifeng Wang

With the availability of large-scale video datasets and the advances of diffusion models, text-driven video generation has achieved substantial progress. However, existing video generation models are typically trained on a limited number of…

Computer Vision and Pattern Recognition · Computer Science 2024-01-31 Haonan Qiu , Menghan Xia , Yong Zhang , Yingqing He , Xintao Wang , Ying Shan , Ziwei Liu

Autoregressive video models offer distinct advantages over bidirectional diffusion models in creating interactive video content and supporting streaming applications with arbitrary duration. In this work, we present Next-Frame Diffusion…

Computer Vision and Pattern Recognition · Computer Science 2025-07-08 Xinle Cheng , Tianyu He , Jiayi Xu , Junliang Guo , Di He , Jiang Bian

Denoising diffusion probabilistic models are a promising new class of generative models that mark a milestone in high-quality image generation. This paper showcases their ability to sequentially generate video, surpassing prior methods in…

Computer Vision and Pattern Recognition · Computer Science 2022-12-09 Ruihan Yang , Prakhar Srivastava , Stephan Mandt

Current video captioning methods usually use an encoder-decoder structure to generate text autoregressively. However, autoregressive methods have inherent limitations such as slow generation speed and large cumulative error. Furthermore,…

Computer Vision and Pattern Recognition · Computer Science 2026-04-10 Junbo Wang , Liangyu Fu , Yuke Li , Yining Zhu , Ya Jing , Xuecheng Wu , Jiangbin Zheng

The task of video generation requires synthesizing visually realistic and temporally coherent video frames. Existing methods primarily use asynchronous auto-regressive models or synchronous diffusion models to address this challenge.…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Mingzhen Sun , Weining Wang , Gen Li , Jiawei Liu , Jiahui Sun , Wanquan Feng , Shanshan Lao , SiYu Zhou , Qian He , Jing Liu

Streaming video generation, as one fundamental component in interactive world models and neural game engines, aims to generate high-quality, low-latency, and temporally coherent long video streams. However, most existing work suffers from…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Kunhao Liu , Wenbo Hu , Jiale Xu , Ying Shan , Shijian Lu

A diffusion probabilistic model (DPM), which constructs a forward diffusion process by gradually adding noise to data points and learns the reverse denoising process to generate new samples, has been shown to handle complex data…

Computer Vision and Pattern Recognition · Computer Science 2023-10-16 Zhengxiong Luo , Dayou Chen , Yingya Zhang , Yan Huang , Liang Wang , Yujun Shen , Deli Zhao , Jingren Zhou , Tieniu Tan

It is desirable but challenging to generate content-rich long videos in the scale of minutes. Autoregressive large language models (LLMs) have achieved great success in generating coherent and long sequences of tokens in the domain of…

Computer Vision and Pattern Recognition · Computer Science 2025-04-03 Yuqing Wang , Tianwei Xiong , Daquan Zhou , Zhijie Lin , Yang Zhao , Bingyi Kang , Jiashi Feng , Xihui Liu

The autoregressive video diffusion model has recently gained considerable research interest due to its causal modeling and iterative denoising. In this work, we identify that the multi-head self-attention in these models under-utilizes…

Computer Vision and Pattern Recognition · Computer Science 2026-01-29 Hang Guo , Zhaoyang Jia , Jiahao Li , Bin Li , Yuanhao Cai , Jiangshan Wang , Yawei Li , Yan Lu

Generating long videos that can show complex stories, like movie scenes from scripts, has great promise and offers much more than short clips. However, current methods that use autoregression with diffusion models often struggle because…

Computer Vision and Pattern Recognition · Computer Science 2025-05-28 Guangcong Zheng , Jianlong Yuan , Bo Wang , Haoyang Huang , Guoqing Ma , Nan Duan

AI-generated content has attracted lots of attention recently, but photo-realistic video synthesis is still challenging. Although many attempts using GANs and autoregressive models have been made in this area, the visual quality and length…

Computer Vision and Pattern Recognition · Computer Science 2023-03-21 Yingqing He , Tianyu Yang , Yong Zhang , Ying Shan , Qifeng Chen

Recent advancements in diffusion models have revolutionized video generation, enabling the creation of high-quality, temporally consistent videos. However, generating high frame-rate (FPS) videos remains a significant challenge due to…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Geunmin Hwang , Hyun-kyu Ko , Younghyun Kim , Seungryong Lee , Eunbyung Park

Diffusion models have recently achieved remarkable results for video generation. Despite the encouraging performances, the generated videos are typically constrained to a small number of frames, resulting in clips lasting merely a few…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Zhenxiong Tan , Xingyi Yang , Songhua Liu , Xinchao Wang
‹ Prev 1 2 3 10 Next ›