English
Related papers

Related papers: Enabling Versatile Controls for Video Diffusion Mo…

200 papers

Video Diffusion Models have been developed for video generation, usually integrating text and image conditioning to enhance control over the generated content. Despite the progress, ensuring consistency across frames remains a challenge,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-12 Tian Xia , Xuweiyi Chen , Sihan Xu

Controllability plays a crucial role in video generation, as it allows users to create and edit content more precisely. Existing models, however, lack control of camera pose that serves as a cinematic language to express deeper narrative…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 Hao He , Yinghao Xu , Yuwei Guo , Gordon Wetzstein , Bo Dai , Hongsheng Li , Ceyuan Yang

We introduce MVControl, a novel neural network architecture that enhances existing pre-trained multi-view 2D diffusion models by incorporating additional input conditions, e.g. edge maps. Our approach enables the generation of controllable…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Zhiqi Li , Yiming Chen , Lingzhe Zhao , Peidong Liu

The development of text-to-video (T2V), i.e., generating videos with a given text prompt, has been significantly advanced in recent years. However, relying solely on text prompts often results in ambiguous frame composition due to spatial…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Yuwei Guo , Ceyuan Yang , Anyi Rao , Maneesh Agrawala , Dahua Lin , Bo Dai

Following the advancements in text-guided image generation technology exemplified by Stable Diffusion, video generation is gaining increased attention in the academic community. However, relying solely on text guidance for video generation…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Cong Wang , Jiaxi Gu , Panwen Hu , Haoyu Zhao , Yuanfan Guo , Jianhua Han , Hang Xu , Xiaodan Liang

ControlNets are widely used for adding spatial control to text-to-image diffusion models with different conditions, such as depth maps, scribbles/sketches, and human poses. However, when it comes to controllable video generation,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-27 Han Lin , Jaemin Cho , Abhay Zala , Mohit Bansal

Existing video generation models excel at producing photo-realistic videos from text or images, but often lack physical plausibility and 3D controllability. To overcome these limitations, we introduce PhysCtrl, a novel framework for…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Chen Wang , Chuhao Chen , Yiming Huang , Zhiyang Dou , Yuan Liu , Jiatao Gu , Lingjie Liu

We tackle the dual challenges of video understanding and controllable video generation within a unified diffusion framework. Our key insights are two-fold: geometry-only cues (e.g., depth, edges) are insufficient: they specify layout but…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Dianbing Xi , Jiepeng Wang , Yuanzhi Liang , Xi Qiu , Jialun Liu , Hao Pan , Yuchi Huo , Rui Wang , Haibin Huang , Chi Zhang , Xuelong Li

Diffusion models have recently become the dominant paradigm for image generation, yet existing systems struggle to interpret and follow numeric instructions for adjusting semantic attributes. In real-world creative scenarios, especially…

Computer Vision and Pattern Recognition · Computer Science 2026-01-12 Die Chen , Zhongjie Duan , Zhiwen Li , Cen Chen , Daoyuan Chen , Yaliang Li , Yingda Chen

Visual generation includes both image and video generation, training probabilistic models to create coherent, diverse, and semantically faithful content from scratch. While early research focused on unconditional sampling, practitioners now…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Zixiang Yang , Yue Ma , Yinhan Zhang , Shanhui Mo , Dongrui Liu , Linfeng Zhang

Recently, diffusion models like StableDiffusion have achieved impressive image generation results. However, the generation process of such diffusion models is uncontrollable, which makes it hard to generate videos with continuous and…

Computer Vision and Pattern Recognition · Computer Science 2023-08-04 Zhihao Hu , Dong Xu

Recent diffusion models have achieved remarkable success in image relighting, and this success has quickly been extended to video relighting. However, existing methods offer limited explicit control over illumination in the relighted…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Yizuo Peng , Xuelin Chen , Kai Zhang , Xiaodong Cun

Leveraging text, images, structure maps, or motion trajectories as conditional guidance, diffusion models have achieved great success in automated and high-quality video generation. However, generating smooth and rational transition videos…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Zuhao Yang , Jiahui Zhang , Yingchen Yu , Shijian Lu , Song Bai

Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Zhengfei Kuang , Shengqu Cai , Hao He , Yinghao Xu , Hongsheng Li , Leonidas Guibas , Gordon Wetzstein

We address the challenge of novel view synthesis from only two input images under large viewpoint changes. Existing regression-based methods lack the capacity to reconstruct unseen regions, while camera-guided diffusion models often deviate…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Liudi Yang , George Eskandar , Fengyi Shen , Mohammad Altillawi , Yang Bai , Chi Zhang , Ziyuan Liu , Abhinav Valada

With the rapid development of AI-generated content (AIGC), video generation has emerged as one of its most dynamic and impactful subfields. In particular, the advancement of video generation foundation models has led to growing demand for…

We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Unlike previous work, our method does not require any supervised finetuning on camera-annotated datasets or…

Computer Vision and Pattern Recognition · Computer Science 2025-02-26 Chen Hou , Zhibo Chen

Cinematic storytelling is profoundly shaped by the artful manipulation of photographic elements such as depth of field and exposure. These effects are crucial in conveying mood and creating aesthetic appeal. However, controlling these…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Huiqiang Sun , Liao Shen , Zhan Peng , Kun Wang , Size Wu , Yuhang Zang , Tianqi Liu , Zihao Huang , Xingyu Zeng , Zhiguo Cao , Wei Li , Chen Change Loy

High-fidelity generative video editing has seen significant quality improvements by leveraging pre-trained video foundation models. However, their computational cost is a major bottleneck, as they are often designed to inefficiently process…

Computer Vision and Pattern Recognition · Computer Science 2026-04-02 Yehonathan Litman , Shikun Liu , Dario Seyb , Nicholas Milef , Yang Zhou , Carl Marshall , Shubham Tulsiani , Caleb Leak

Recent advances in text-to-image (T2I) diffusion models have enabled impressive image generation capabilities guided by text prompts. However, extending these techniques to video generation remains challenging, with existing text-to-video…

Computer Vision and Pattern Recognition · Computer Science 2024-08-13 Weifeng Chen , Yatai Ji , Jie Wu , Hefeng Wu , Pan Xie , Jiashi Li , Xin Xia , Xuefeng Xiao , Liang Lin
‹ Prev 1 2 3 10 Next ›