Related papers: Enabling Versatile Controls for Video Diffusion Mo…

UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control

Video Diffusion Models have been developed for video generation, usually integrating text and image conditioning to enhance control over the generated content. Despite the progress, ensuring consistency across frames remains a challenge,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-12 Tian Xia , Xuweiyi Chen , Sihan Xu

CameraCtrl: Enabling Camera Control for Text-to-Video Generation

Controllability plays a crucial role in video generation, as it allows users to create and edit content more precisely. Existing models, however, lack control of camera pose that serves as a cinematic language to express deeper narrative…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 Hao He , Yinghao Xu , Yuwei Guo , Gordon Wetzstein , Bo Dai , Hongsheng Li , Ceyuan Yang

MVControl: Adding Conditional Control to Multi-view Diffusion for Controllable Text-to-3D Generation

We introduce MVControl, a novel neural network architecture that enhances existing pre-trained multi-view 2D diffusion models by incorporating additional input conditions, e.g. edge maps. Our approach enables the generation of controllable…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Zhiqi Li , Yiming Chen , Lingzhe Zhao , Peidong Liu

SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models

The development of text-to-video (T2V), i.e., generating videos with a given text prompt, has been significantly advanced in recent years. However, relying solely on text prompts often results in ambiguous frame composition due to spatial…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Yuwei Guo , Ceyuan Yang , Anyi Rao , Maneesh Agrawala , Dahua Lin , Bo Dai

EasyControl: Transfer ControlNet to Video Diffusion for Controllable Generation and Interpolation

Following the advancements in text-guided image generation technology exemplified by Stable Diffusion, video generation is gaining increased attention in the academic community. However, relying solely on text guidance for video generation…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Cong Wang , Jiaxi Gu , Panwen Hu , Haoyu Zhao , Yuanfan Guo , Jianhua Han , Hang Xu , Xiaodan Liang

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

ControlNets are widely used for adding spatial control to text-to-image diffusion models with different conditions, such as depth maps, scribbles/sketches, and human poses. However, when it comes to controllable video generation,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-27 Han Lin , Jaemin Cho , Abhay Zala , Mohit Bansal

PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation

Existing video generation models excel at producing photo-realistic videos from text or images, but often lack physical plausibility and 3D controllability. To overcome these limitations, we introduce PhysCtrl, a novel framework for…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Chen Wang , Chuhao Chen , Yiming Huang , Zhiyang Dou , Yuan Liu , Jiatao Gu , Lingjie Liu

CtrlVDiff: Controllable Video Generation via Unified Multimodal Video Diffusion

We tackle the dual challenges of video understanding and controllable video generation within a unified diffusion framework. Our key insights are two-fold: geometry-only cues (e.g., depth, edges) are insufficient: they specify layout but…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Dianbing Xi , Jiepeng Wang , Yuanzhi Liang , Xi Qiu , Jialun Liu , Hao Pan , Yuchi Huo , Rui Wang , Haibin Huang , Chi Zhang , Xuelong Li

AttriCtrl: Fine-Grained Control of Aesthetic Attribute Intensity in Diffusion Models

Diffusion models have recently become the dominant paradigm for image generation, yet existing systems struggle to interpret and follow numeric instructions for adjusting semantic attributes. In real-world creative scenarios, especially…

Computer Vision and Pattern Recognition · Computer Science 2026-01-12 Die Chen , Zhongjie Duan , Zhiwen Li , Cen Chen , Daoyuan Chen , Yaliang Li , Yingda Chen

EVCtrl: Efficient Control Adapter for Visual Generation

Visual generation includes both image and video generation, training probabilistic models to create coherent, diverse, and semantically faithful content from scratch. While early research focused on unconditional sampling, practitioners now…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Zixiang Yang , Yue Ma , Yinhan Zhang , Shanhui Mo , Dongrui Liu , Linfeng Zhang

VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet

Recently, diffusion models like StableDiffusion have achieved impressive image generation results. However, the generation process of such diffusion models is uncontrollable, which makes it hard to generate videos with continuous and…

Computer Vision and Pattern Recognition · Computer Science 2023-08-04 Zhihao Hu , Dong Xu

LightCtrl: Training-free Controllable Video Relighting

Recent diffusion models have achieved remarkable success in image relighting, and this success has quickly been extended to video relighting. However, existing methods offer limited explicit control over illumination in the relighted…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Yizuo Peng , Xuelin Chen , Kai Zhang , Xiaodong Cun

Versatile Transition Generation with Image-to-Video Diffusion

Leveraging text, images, structure maps, or motion trajectories as conditional guidance, diffusion models have achieved great success in automated and high-quality video generation. However, generating smooth and rational transition videos…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Zuhao Yang , Jiahui Zhang , Yingchen Yu , Shijian Lu , Song Bai

Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Zhengfei Kuang , Shengqu Cai , Hao He , Yinghao Xu , Hongsheng Li , Leonidas Guibas , Gordon Wetzstein

ConfCtrl: Enabling Precise Camera Control in Video Diffusion via Confidence-Aware Interpolation

We address the challenge of novel view synthesis from only two input images under large viewpoint changes. Existing regression-based methods lack the capacity to reconstruct unseen regions, while camera-guided diffusion models often deviate…

Computer Vision and Pattern Recognition · Computer Science 2026-03-11 Liudi Yang , George Eskandar , Fengyi Shen , Mohammad Altillawi , Yang Bai , Chi Zhang , Ziyuan Liu , Abhinav Valada

Controllable Video Generation: A Survey

With the rapid development of AI-generated content (AIGC), video generation has emerged as one of its most dynamic and impactful subfields. In particular, the advancement of video generation foundation models has led to growing demand for…

Graphics · Computer Science 2026-01-21 Yue Ma , Kunyu Feng , Zhongyuan Hu , Xinyu Wang , Yucheng Wang , Mingzhe Zheng , Bingyuan Wang , Qinghe Wang , Xuanhua He , Hongfa Wang , Chenyang Zhu , Hongyu Liu , Yingqing He , Zeyu Wang , Zhifeng Li , Xiu Li , Sirui Han , Yike Guo , Wei Liu , Dan Xu , Linfeng Zhang , Qifeng Chen

Training-free Camera Control for Video Generation

We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Unlike previous work, our method does not require any supervised finetuning on camera-annotated datasets or…

Computer Vision and Pattern Recognition · Computer Science 2025-02-26 Chen Hou , Zhibo Chen

Generative Photographic Control for Scene-Consistent Video Cinematic Editing

Cinematic storytelling is profoundly shaped by the artful manipulation of photographic elements such as depth of field and exposure. These effects are crucial in conveying mood and creating aesthetic appeal. However, controlling these…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Huiqiang Sun , Liao Shen , Zhan Peng , Kun Wang , Size Wu , Yuhang Zang , Tianqi Liu , Zihao Huang , Xingyu Zeng , Zhiguo Cao , Wei Li , Chen Change Loy

EditCtrl: Disentangled Local and Global Control for Real-Time Generative Video Editing

High-fidelity generative video editing has seen significant quality improvements by leveraging pre-trained video foundation models. However, their computational cost is a major bottleneck, as they are often designed to inefficiently process…

Computer Vision and Pattern Recognition · Computer Science 2026-04-02 Yehonathan Litman , Shikun Liu , Dario Seyb , Nicholas Milef , Yang Zhou , Carl Marshall , Shubham Tulsiani , Caleb Leak

Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning

Recent advances in text-to-image (T2I) diffusion models have enabled impressive image generation capabilities guided by text prompts. However, extending these techniques to video generation remains challenging, with existing text-to-video…

Computer Vision and Pattern Recognition · Computer Science 2024-08-13 Weifeng Chen , Yatai Ji , Jie Wu , Hefeng Wu , Pan Xie , Jiashi Li , Xin Xia , Xuefeng Xiao , Liang Lin