Related papers: SneakPeek: Future-Guided Instructional Streaming V…

Optical-Flow Guided Prompt Optimization for Coherent Video Generation

While text-to-video diffusion models have made significant strides, many still face challenges in generating videos with temporal consistency. Within diffusion frameworks, guidance techniques have proven effective in enhancing output…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Hyelin Nam , Jaemin Kim , Dohun Lee , Jong Chul Ye

A Survey: Spatiotemporal Consistency in Video Generation

Video generation aims to produce temporally coherent sequences of visual frames, representing a pivotal advancement in Artificial Intelligence Generated Content (AIGC). Compared to static image generation, video generation poses unique…

Computer Vision and Pattern Recognition · Computer Science 2026-02-19 Zhiyu Yin , Kehai Chen , Xuefeng Bai , Ruili Jiang , Juntao Li , Hongdong Li , Jin Liu , Yang Xiang , Jun Yu , Min Zhang

SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction

Recently video generation has achieved substantial progress with realistic results. Nevertheless, existing AI-generated videos are usually very short clips ("shot-level") depicting a single scene. To deliver a coherent long video…

Computer Vision and Pattern Recognition · Computer Science 2023-11-07 Xinyuan Chen , Yaohui Wang , Lingjun Zhang , Shaobin Zhuang , Xin Ma , Jiashuo Yu , Yali Wang , Dahua Lin , Yu Qiao , Ziwei Liu

Video Diffusion Models

Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial…

Computer Vision and Pattern Recognition · Computer Science 2022-06-24 Jonathan Ho , Tim Salimans , Alexey Gritsenko , William Chan , Mohammad Norouzi , David J. Fleet

In-Video Instructions: Visual Signals as Generative Control

Large-scale video generative models have recently demonstrated strong visual capabilities, enabling the prediction of future frames that adhere to the logical and physical cues in the current observation. In this work, we investigate…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Gongfan Fang , Xinyin Ma , Xinchao Wang

Motion Prompting: Controlling Video Generation with Motion Trajectories

Motion control is crucial for generating expressive and compelling video content; however, most existing video generation models rely mainly on text prompts for control, which struggle to capture the nuances of dynamic actions and temporal…

Computer Vision and Pattern Recognition · Computer Science 2025-03-31 Daniel Geng , Charles Herrmann , Junhwa Hur , Forrester Cole , Serena Zhang , Tobias Pfaff , Tatiana Lopez-Guevara , Carl Doersch , Yusuf Aytar , Michael Rubinstein , Chen Sun , Oliver Wang , Andrew Owens , Deqing Sun

Resource-Efficient Motion Control for Video Generation via Dynamic Mask Guidance

Recent advances in diffusion models bring new vitality to visual content creation. However, current text-to-video generation models still face significant challenges such as high training costs, substantial data requirements, and…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Sicong Feng , Jielong Yang , Li Peng

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

Creating a vivid video from the event or scenario in our imagination is a truly fascinating experience. Recent advancements in text-to-video synthesis have unveiled the potential to achieve this with prompts only. While text is convenient…

Computer Vision and Pattern Recognition · Computer Science 2023-06-02 Jinbo Xing , Menghan Xia , Yuxin Liu , Yuechen Zhang , Yong Zhang , Yingqing He , Hanyuan Liu , Haoxin Chen , Xiaodong Cun , Xintao Wang , Ying Shan , Tien-Tsin Wong

Structure and Content-Guided Video Synthesis with Diffusion Models

Text-guided generative diffusion models unlock powerful image creation and editing tools. While these have been extended to video generation, current approaches that edit the content of existing footage while retaining structure require…

Computer Vision and Pattern Recognition · Computer Science 2023-02-07 Patrick Esser , Johnathan Chiu , Parmida Atighehchian , Jonathan Granskog , Anastasis Germanidis

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions

We introduce InstructVid2Vid, an end-to-end diffusion-based methodology for video editing guided by human language instructions. Our approach empowers video manipulation guided by natural language directives, eliminating the need for…

Computer Vision and Pattern Recognition · Computer Science 2024-05-30 Bosheng Qin , Juncheng Li , Siliang Tang , Tat-Seng Chua , Yueting Zhuang

MEVG: Multi-event Video Generation with Text-to-Video Models

We introduce a novel diffusion-based video generation method, generating a video showing multiple events given multiple individual sentences from the user. Our method does not require a large-scale video dataset since our method uses a…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Gyeongrok Oh , Jaehwan Jeong , Sieun Kim , Wonmin Byeon , Jinkyu Kim , Sungwoong Kim , Sangpil Kim

MotionStream: Real-Time Video Generation with Interactive Motion Controls

Current motion-conditioned video generation methods suffer from prohibitive latency (minutes per video) and non-causal processing that prevents real-time interaction. We present MotionStream, enabling sub-second latency with up to 29 FPS…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Joonghyuk Shin , Zhengqi Li , Richard Zhang , Jun-Yan Zhu , Jaesik Park , Eli Shechtman , Xun Huang

GePSAn: Generative Procedure Step Anticipation in Cooking Videos

We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-13 Mohamed Ashraf Abdelsalam , Samrudhdhi B. Rangrej , Isma Hadji , Nikita Dvornik , Konstantinos G. Derpanis , Afsaneh Fazly

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

We introduce $\textit{InteractiveVideo}$, a user-centric framework for video generation. Different from traditional generative approaches that operate based on user-provided images or text, our framework is designed for dynamic interaction,…

Computer Vision and Pattern Recognition · Computer Science 2024-02-06 Yiyuan Zhang , Yuhao Kang , Zhixin Zhang , Xiaohan Ding , Sanyuan Zhao , Xiangyu Yue

Contrastive Sequential-Diffusion Learning: Non-linear and Multi-Scene Instructional Video Synthesis

Generated video scenes for action-centric sequence descriptions, such as recipe instructions and do-it-yourself projects, often include non-linear patterns, where the next video may need to be visually consistent not with the immediately…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Vasco Ramos , Yonatan Bitton , Michal Yarom , Idan Szpektor , Joao Magalhaes

MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space

This paper addresses the challenge of text-conditioned streaming motion generation, which requires us to predict the next-step human pose based on variable-length historical motions and incoming texts. Existing methods struggle to achieve…

Computer Vision and Pattern Recognition · Computer Science 2025-08-08 Lixing Xiao , Shunlin Lu , Huaijin Pi , Ke Fan , Liang Pan , Yueer Zhou , Ziyong Feng , Xiaowei Zhou , Sida Peng , Jingbo Wang

ViewPoint: Panoramic Video Generation with Pretrained Diffusion Models

Panoramic video generation aims to synthesize 360-degree immersive videos, holding significant importance in the fields of VR, world models, and spatial intelligence. Existing works fail to synthesize high-quality panoramic videos due to…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Zixun Fang , Kai Zhu , Zhiheng Liu , Yu Liu , Wei Zhai , Yang Cao , Zheng-Jun Zha

Generating Long Videos of Dynamic Scenes

We present a video generation model that accurately reproduces object motion, changes in camera viewpoint, and new content that arises over time. Existing video generation methods often fail to produce new content as a function of time…

Computer Vision and Pattern Recognition · Computer Science 2022-06-10 Tim Brooks , Janne Hellsten , Miika Aittala , Ting-Chun Wang , Timo Aila , Jaakko Lehtinen , Ming-Yu Liu , Alexei A. Efros , Tero Karras

Streaming Autoregressive Video Generation via Diagonal Distillation

Large pretrained diffusion models have significantly enhanced the quality of generated videos, and yet their use in real-time streaming remains limited. Autoregressive models offer a natural framework for sequential frame synthesis but…

Computer Vision and Pattern Recognition · Computer Science 2026-03-12 Jinxiu Liu , Xuanming Liu , Kangfu Mei , Yandong Wen , Ming-Hsuan Yang , Weiyang Liu

Controllable Video Generation: A Survey

With the rapid development of AI-generated content (AIGC), video generation has emerged as one of its most dynamic and impactful subfields. In particular, the advancement of video generation foundation models has led to growing demand for…

Graphics · Computer Science 2026-01-21 Yue Ma , Kunyu Feng , Zhongyuan Hu , Xinyu Wang , Yucheng Wang , Mingzhe Zheng , Bingyuan Wang , Qinghe Wang , Xuanhua He , Hongfa Wang , Chenyang Zhu , Hongyu Liu , Yingqing He , Zeyu Wang , Zhifeng Li , Xiu Li , Sirui Han , Yike Guo , Wei Liu , Dan Xu , Linfeng Zhang , Qifeng Chen