Related papers: Collaborative Video Diffusion: Consistent Multi-vi…

Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention

In recent years there have been remarkable breakthroughs in image-to-video generation. However, the 3D consistency and camera controllability of generated frames have remained unsolved. Recent studies have attempted to incorporate camera…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Dejia Xu , Yifan Jiang , Chen Huang , Liangchen Song , Thorsten Gernoth , Liangliang Cao , Zhangyang Wang , Hao Tang

View-Consistent Diffusion Representations for 3D-Consistent Video Generation

Video generation models have made significant progress in generating realistic content, enabling applications in simulation, gaming, and film making. However, current generated videos still contain visual artifacts arising from 3D…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Duolikun Danier , Ge Gao , Steven McDonagh , Changjian Li , Hakan Bilen , Oisin Mac Aodha

World-consistent Video Diffusion with Explicit 3D Modeling

Recent advancements in diffusion models have set new benchmarks in image and video generation, enabling realistic visual synthesis across single- and multi-frame contexts. However, these models still struggle with efficiently and explicitly…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Qihang Zhang , Shuangfei Zhai , Miguel Angel Bautista , Kevin Miao , Alexander Toshev , Joshua Susskind , Jiatao Gu

DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model

With the increasing popularity of autonomous driving based on the powerful and unified bird's-eye-view (BEV) representation, a demand for high-quality and large-scale multi-view video data with accurate annotation is urgently required.…

Computer Vision and Pattern Recognition · Computer Science 2023-10-13 Xiaofan Li , Yifu Zhang , Xiaoqing Ye

GD-VDM: Generated Depth for better Diffusion-based Video Generation

The field of generative models has recently witnessed significant progress, with diffusion models showing remarkable performance in image generation. In light of this success, there is a growing interest in exploring the application of…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Ariel Lapid , Idan Achituve , Lior Bracha , Ethan Fetaya

Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion

A recent frontier in computer vision has been the task of 3D video generation, which consists of generating a time-varying 3D representation of a scene. To generate dynamic 3D scenes, current methods explicitly model 3D temporal dynamics by…

Computer Vision and Pattern Recognition · Computer Science 2024-08-01 Rishab Parthasarathy , Zachary Ankner , Aaron Gokaslan

FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis

We present FloVD, a novel video diffusion model for camera-controllable video generation. FloVD leverages optical flow to represent the motions of the camera and moving objects. This approach offers two key benefits. Since optical flow can…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Wonjoon Jin , Qi Dai , Chong Luo , Seung-Hwan Baek , Sunghyun Cho

Vivid-ZOO: Multi-View Video Generation with Diffusion Model

While diffusion models have shown impressive performance in 2D image/video generation, diffusion-based Text-to-Multi-view-Video (T2MVid) generation remains underexplored. The new challenges posed by T2MVid generation lie in the lack of…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Bing Li , Cheng Zheng , Wenxuan Zhu , Jinjie Mai , Biao Zhang , Peter Wonka , Bernard Ghanem

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge,…

Computer Vision and Pattern Recognition · Computer Science 2023-02-17 Omer Bar-Tal , Lior Yariv , Yaron Lipman , Tali Dekel

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

Recently video diffusion models have emerged as expressive generative tools for high-quality video content creation readily available to general users. However, these models often do not offer precise control over camera poses for video…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Dejia Xu , Weili Nie , Chao Liu , Sifei Liu , Jan Kautz , Zhangyang Wang , Arash Vahdat

Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious…

Computer Vision and Pattern Recognition · Computer Science 2023-12-05 Shengqu Cai , Duygu Ceylan , Matheus Gadelha , Chun-Hao Paul Huang , Tuanfeng Yang Wang , Gordon Wetzstein

DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer

Collecting multi-view driving scenario videos to enhance the performance of 3D visual perception tasks presents significant challenges and incurs substantial costs, making generative models for realistic data an appealing alternative. Yet,…

Computer Vision and Pattern Recognition · Computer Science 2025-04-29 Junpeng Jiang , Gangyi Hong , Miao Zhang , Hengtong Hu , Kun Zhan , Rui Shao , Liqiang Nie

Ctrl-V: Higher Fidelity Video Generation with Bounding-Box Controlled Object Motion

Controllable video generation has attracted significant attention, largely due to advances in video diffusion models. In domains such as autonomous driving, it is essential to develop highly accurate predictions for object motions. This…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Ge Ya Luo , Zhi Hao Luo , Anthony Gosselin , Alexia Jolicoeur-Martineau , Christopher Pal

CtrlVDiff: Controllable Video Generation via Unified Multimodal Video Diffusion

We tackle the dual challenges of video understanding and controllable video generation within a unified diffusion framework. Our key insights are two-fold: geometry-only cues (e.g., depth, edges) are insufficient: they specify layout but…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Dianbing Xi , Jiepeng Wang , Yuanzhi Liang , Xi Qiu , Jialun Liu , Hao Pan , Yuchi Huo , Rui Wang , Haibin Huang , Chi Zhang , Xuelong Li

CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving

Generative models have been widely applied to world modeling for environment simulation and future state prediction. With advancements in autonomous driving, there is a growing demand not only for high-fidelity video generation under…

Computer Vision and Pattern Recognition · Computer Science 2025-10-17 Tianrui Zhang , Yichen Liu , Zilin Guo , Yuxin Guo , Jingcheng Ni , Chenjing Ding , Dan Xu , Lewei Lu , Zehuan Wu

V3D: Video Diffusion Models are Effective 3D Generators

Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Zilong Chen , Yikai Wang , Feng Wang , Zhengyi Wang , Huaping Liu

Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition

Video diffusion models have recently made great progress in generation quality, but are still limited by the high memory and computational requirements. This is because current video diffusion models often attempt to process…

Computer Vision and Pattern Recognition · Computer Science 2024-03-22 Sihyun Yu , Weili Nie , De-An Huang , Boyi Li , Jinwoo Shin , Anima Anandkumar

JVID: Joint Video-Image Diffusion for Visual-Quality and Temporal-Consistency in Video Generation

We introduce the Joint Video-Image Diffusion model (JVID), a novel approach to generating high-quality and temporally coherent videos. We achieve this by integrating two diffusion models: a Latent Image Diffusion Model (LIDM) trained on…

Computer Vision and Pattern Recognition · Computer Science 2024-09-30 Hadrien Reynaud , Matthew Baugh , Mischa Dombrowski , Sarah Cechnicka , Qingjie Meng , Bernhard Kainz

CMD: Controllable Multiview Diffusion for 3D Editing and Progressive Generation

Recently, 3D generation methods have shown their powerful ability to automate 3D model creation. However, most 3D generation methods only rely on an input image or a text prompt to generate a 3D model, which lacks the control of each…

Computer Vision and Pattern Recognition · Computer Science 2025-07-08 Peng Li , Suizhi Ma , Jialiang Chen , Yuan Liu , Congyi Zhang , Wei Xue , Wenhan Luo , Alla Sheffer , Wenping Wang , Yike Guo

MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation

Video prediction is a challenging task. The quality of video frames from current state-of-the-art (SOTA) generative models tends to be poor and generalization beyond the training data is difficult. Furthermore, existing prediction…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Vikram Voleti , Alexia Jolicoeur-Martineau , Christopher Pal