Enabling Versatile Controls for Video Diffusion Models

Xu Zhang; Hao Zhou; Haoming Qin; Xiaobin Lu; Jiaxing Yan; Guanzhong Wang; Zeyu Chen; Yi Liu

Enabling Versatile Controls for Video Diffusion Models

Computer Vision and Pattern Recognition 2025-03-24 v1 Artificial Intelligence

Authors: Xu Zhang , Hao Zhou , Haoming Qin , Xiaobin Lu , Jiaxing Yan , Guanzhong Wang , Zeyu Chen , Yi Liu

Abstract

Despite substantial progress in text-to-video generation, achieving precise and flexible control over fine-grained spatiotemporal attributes remains a significant unresolved challenge in video generation research. To address these limitations, we introduce VCtrl (also termed PP-VCtrl), a novel framework designed to enable fine-grained control over pre-trained video diffusion models in a unified manner. VCtrl integrates diverse user-specified control signals-such as Canny edges, segmentation masks, and human keypoints-into pretrained video diffusion models via a generalizable conditional module capable of uniformly encoding multiple types of auxiliary signals without modifying the underlying generator. Additionally, we design a unified control signal encoding pipeline and a sparse residual connection mechanism to efficiently incorporate control representations. Comprehensive experiments and human evaluations demonstrate that VCtrl effectively enhances controllability and generation quality. The source code and pre-trained models are publicly available and implemented using the PaddlePaddle framework at http://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/ppvctrl.

Keywords

video generation diffusion model video retrieval

Cite

@article{arxiv.2503.16983,
  title  = {Enabling Versatile Controls for Video Diffusion Models},
  author = {Xu Zhang and Hao Zhou and Haoming Qin and Xiaobin Lu and Jiaxing Yan and Guanzhong Wang and Zeyu Chen and Yi Liu},
  journal= {arXiv preprint arXiv:2503.16983},
  year   = {2025}
}

Comments

Codes and Supplementary Material: http://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/ppvctrl

Enabling Versatile Controls for Video Diffusion Models

Abstract

Keywords

Cite

Comments

Related papers