Related papers: Cinematographic Camera Diffusion Model

VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control

Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of complex videos from a text description. However, most existing models lack fine-grained control over camera movement, which is critical for downstream…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Sherwin Bahmani , Ivan Skorokhodov , Aliaksandr Siarohin , Willi Menapace , Guocheng Qian , Michael Vasilkovsky , Hsin-Ying Lee , Chaoyang Wang , Jiaxu Zou , Andrea Tagliasacchi , David B. Lindell , Sergey Tulyakov

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious…

Computer Vision and Pattern Recognition · Computer Science 2023-12-05 Shengqu Cai , Duygu Ceylan , Matheus Gadelha , Chun-Hao Paul Huang , Tuanfeng Yang Wang , Gordon Wetzstein

Curved Diffusion: A Generative Model With Optical Geometry Control

State-of-the-art diffusion models can generate highly realistic images based on various conditioning like text, segmentation, and depth. However, an essential aspect often overlooked is the specific camera geometry used during image…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Andrey Voynov , Amir Hertz , Moab Arar , Shlomi Fruchter , Daniel Cohen-Or

Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text

Recent advancements in 3D generation have leveraged synthetic datasets with ground truth 3D assets and predefined cameras. However, the potential of adopting real-world datasets, which can produce significantly more realistic 3D scenes,…

Computer Vision and Pattern Recognition · Computer Science 2024-06-26 Xinyang Li , Zhangyu Lai , Linning Xu , Yansong Qu , Liujuan Cao , Shengchuan Zhang , Bo Dai , Rongrong Ji

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

Numerous works have recently integrated 3D camera control into foundational text-to-video models, but the resulting camera control is often imprecise, and video generation quality suffers. In this work, we analyze camera motion from a first…

Computer Vision and Pattern Recognition · Computer Science 2025-05-07 Sherwin Bahmani , Ivan Skorokhodov , Guocheng Qian , Aliaksandr Siarohin , Willi Menapace , Andrea Tagliasacchi , David B. Lindell , Sergey Tulyakov

Design Booster: A Text-Guided Diffusion Model for Image Translation with Spatial Layout Preservation

Diffusion models are able to generate photorealistic images in arbitrary scenes. However, when applying diffusion models to image translation, there exists a trade-off between maintaining spatial structure and high-quality content. Besides,…

Computer Vision and Pattern Recognition · Computer Science 2023-02-07 Shiqi Sun , Shancheng Fang , Qian He , Wei Liu

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

Diffusion-based methods can generate realistic images and videos, but they struggle to edit existing objects in a video while preserving their appearance over time. This prevents diffusion models from being applied to natural video editing…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Wenhao Chai , Xun Guo , Gaoang Wang , Yan Lu

Diffusion Path Alignment for Long-Range Motion Generation and Domain Transitions

Long-range human movement generation remains a central challenge in computer vision and graphics. Generating coherent transitions across semantically distinct motion domains remains largely unexplored. This capability is particularly…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Haichao Wang , Alexander Okupnik , Yuxing Han , Gene Wen , Johannes Schneider , Kyriakos Flouris

Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models

Diffusion models have achieved great progress in image animation due to powerful generative capabilities. However, maintaining spatio-temporal consistency with detailed information from the input static image over time (e.g., style,…

Computer Vision and Pattern Recognition · Computer Science 2024-07-24 Xin Ma , Yaohui Wang , Gengyun Jia , Xinyuan Chen , Yuan-Fang Li , Cunjian Chen , Yu Qiao

Automatic Camera Trajectory Control with Enhanced Immersion for Virtual Cinematography

User-generated cinematic creations are gaining popularity as our daily entertainment, yet it is a challenge to master cinematography for producing immersive contents. Many existing automatic methods focus on roughly controlling predefined…

Multimedia · Computer Science 2024-05-24 Xinyi Wu , Haohong Wang , Aggelos K. Katsaggelos

Taming Diffusion Probabilistic Models for Character Control

We present a novel character control framework that effectively utilizes motion diffusion probabilistic models to generate high-quality and diverse character animations, responding in real-time to a variety of dynamic user-supplied control…

Graphics · Computer Science 2024-04-24 Rui Chen , Mingyi Shi , Shaoli Huang , Ping Tan , Taku Komura , Xuelin Chen

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Text-to-video generation aims to produce a video based on a given prompt. Recently, several commercial video models have been able to generate plausible videos with minimal noise, excellent details, and high aesthetic scores. However, these…

Computer Vision and Pattern Recognition · Computer Science 2024-01-18 Haoxin Chen , Yong Zhang , Xiaodong Cun , Menghan Xia , Xintao Wang , Chao Weng , Ying Shan

Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

Large text-to-image diffusion models have exhibited impressive proficiency in generating high-quality images. However, when applying these models to video domain, ensuring temporal consistency across video frames remains a formidable…

Computer Vision and Pattern Recognition · Computer Science 2023-09-19 Shuai Yang , Yifan Zhou , Ziwei Liu , Chen Change Loy

Stable Video-Driven Portraits

Portrait animation aims to generate photo-realistic videos from a single source image by reenacting the expression and pose from a driving video. While early methods relied on 3D morphable models or feature warping techniques, they often…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Mallikarjun B. R. , Fei Yin , Vikram Voleti , Nikita Drobyshev , Maksim Lapin , Aaryaman Vasishta , Varun Jampani

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge,…

Computer Vision and Pattern Recognition · Computer Science 2023-02-17 Omer Bar-Tal , Lior Yariv , Yaron Lipman , Tali Dekel

Fitting Image Diffusion Models on Video Datasets

Image diffusion models are trained on independently sampled static images. While this is the bedrock task protocol in generative modeling, capturing the temporal world through the lens of static snapshots is information-deficient by design.…

Computer Vision and Pattern Recognition · Computer Science 2025-09-05 Juhun Lee , Simon S. Woo

Strong and Controllable 3D Motion Generation

Human motion generation is a significant pursuit in generative computer vision with widespread applications in film-making, video games, AR/VR, and human-robot interaction. Current methods mainly utilize either diffusion-based generative…

Computer Vision and Pattern Recognition · Computer Science 2025-02-03 Canxuan Gang

CT-1: Vision-Language-Camera Models Transfer Spatial Reasoning Knowledge to Camera-Controllable Video Generation

Camera-controllable video generation aims to synthesize videos with flexible and physically plausible camera movements. However, existing methods either provide imprecise camera control from text prompts or rely on labor-intensive manual…

Computer Vision and Pattern Recognition · Computer Science 2026-04-13 Haoyu Zhao , Zihao Zhang , Jiaxi Gu , Haoran Chen , Qingping Zheng , Pin Tang , Yeyin Jin , Yuang Zhang , Junqi Cheng , Zenghui Lu , Peng Shu , Zuxuan Wu , Yu-Gang Jiang

Canvas-to-Image: Compositional Image Generation with Multimodal Controls

While modern diffusion models excel at generating high-quality and diverse images, they still struggle with high-fidelity compositional and multimodal control, particularly when users simultaneously specify text prompts, subject references,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Yusuf Dalva , Guocheng Gordon Qian , Maya Goldenberg , Tsai-Shien Chen , Kfir Aberman , Sergey Tulyakov , Pinar Yanardag , Kuan-Chieh Jackson Wang

MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models

Text-to-video models have demonstrated impressive capabilities in producing diverse and captivating video content, showcasing a notable advancement in generative AI. However, these models generally lack fine-grained control over motion…

Computer Vision and Pattern Recognition · Computer Science 2024-12-09 Tuna Han Salih Meral , Hidir Yesiltepe , Connor Dunlop , Pinar Yanardag