Related papers: 4Diffusion: Multi-view Video Diffusion Model for 4…

Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models

The availability of large-scale multimodal datasets and advancements in diffusion models have significantly accelerated progress in 4D content generation. Most prior approaches rely on multiple image or video diffusion models, utilizing…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Hanwen Liang , Yuyang Yin , Dejia Xu , Hanxue Liang , Zhangyang Wang , Konstantinos N. Plataniotis , Yao Zhao , Yunchao Wei

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models

Recent advancements in 3D generation are predominantly propelled by improvements in 3D-aware image diffusion models. These models are pretrained on Internet-scale image data and fine-tuned on massive 3D data, offering the capability of…

Computer Vision and Pattern Recognition · Computer Science 2024-10-03 Zeyu Yang , Zijie Pan , Chun Gu , Li Zhang

Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion

Multi-view or 4D video generation has emerged as a significant research topic. Nonetheless, recent approaches to 4D generation still struggle with fundamental limitations, as they primarily rely on harnessing multiple video diffusion models…

Computer Vision and Pattern Recognition · Computer Science 2025-12-05 Jangho Park , Taesung Kwon , Jong Chul Ye

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

We present Stable Video 4D (SV4D), a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation. Unlike previous methods that rely on separately trained generative models for video generation and…

Computer Vision and Pattern Recognition · Computer Science 2025-03-03 Yiming Xie , Chun-Han Yao , Vikram Voleti , Huaizu Jiang , Varun Jampani

Diffusion Priors for Dynamic View Synthesis from Monocular Videos

Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos. Existing methods struggle to distinguishing between motion and structure, particularly in scenarios where camera poses are either unknown…

Computer Vision and Pattern Recognition · Computer Science 2024-01-12 Chaoyang Wang , Peiye Zhuang , Aliaksandr Siarohin , Junli Cao , Guocheng Qian , Hsin-Ying Lee , Sergey Tulyakov

Accelerating Video Diffusion Models via Distribution Matching

Generative models, particularly diffusion models, have made significant success in data synthesis across various modalities, including images, videos, and 3D assets. However, current diffusion models are computationally intensive, often…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Yuanzhi Zhu , Hanshu Yan , Huan Yang , Kai Zhang , Junnan Li

A Unified Approach for Text- and Image-guided 4D Scene Generation

Large-scale diffusion generative models are greatly simplifying image, video and 3D asset creation from user-provided text prompts and images. However, the challenging problem of text-to-4D dynamic 3D scene generation with diffusion…

Computer Vision and Pattern Recognition · Computer Science 2024-05-08 Yufeng Zheng , Xueting Li , Koki Nagano , Sifei Liu , Karsten Kreis , Otmar Hilliges , Shalini De Mello

Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation

Text-to-3D generation has shown rapid progress in recent days with the advent of score distillation, a methodology of using pretrained text-to-2D diffusion models to optimize neural radiance field (NeRF) in the zero-shot setting. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-02-07 Junyoung Seo , Wooseok Jang , Min-Seop Kwak , Hyeonsu Kim , Jaehoon Ko , Junho Kim , Jin-Hwa Kim , Jiyoung Lee , Seungryong Kim

4DVD: Cascaded Dense-view Video Diffusion Model for High-quality 4D Content Generation

Given the high complexity of directly generating high-dimensional data such as 4D, we present 4DVD, a cascaded video diffusion model that generates 4D content in a decoupled manner. Unlike previous multi-view video methods that directly…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Shuzhou Yang , Xiaodong Cun , Xiaoyu Li , Yaowei Li , Jian Zhang

4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion

We propose 4Real-Video, a novel framework for generating 4D videos, organized as a grid of video frames with both time and viewpoint axes. In this grid, each row contains frames sharing the same timestep, while each column contains frames…

Computer Vision and Pattern Recognition · Computer Science 2024-12-06 Chaoyang Wang , Peiye Zhuang , Tuan Duc Ngo , Willi Menapace , Aliaksandr Siarohin , Michael Vasilkovsky , Ivan Skorokhodov , Sergey Tulyakov , Peter Wonka , Hsin-Ying Lee

4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and…

Computer Vision and Pattern Recognition · Computer Science 2024-11-22 Heng Yu , Chaoyang Wang , Peiye Zhuang , Willi Menapace , Aliaksandr Siarohin , Junli Cao , Laszlo A Jeni , Sergey Tulyakov , Hsin-Ying Lee

STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians

Recent progress in pre-trained diffusion models and 3D generation have spurred interest in 4D content creation. However, achieving high-fidelity 4D generation with spatial-temporal consistency remains a challenge. In this work, we propose…

Computer Vision and Pattern Recognition · Computer Science 2024-03-25 Yifei Zeng , Yanqin Jiang , Siyu Zhu , Yuanxun Lu , Youtian Lin , Hao Zhu , Weiming Hu , Xun Cao , Yao Yao

Dream4D: Lifting Camera-Controlled I2V towards Spatiotemporally Consistent 4D Generation

The synthesis of spatiotemporally coherent 4D content presents fundamental challenges in computer vision, requiring simultaneous modeling of high-fidelity spatial representations and physically plausible temporal dynamics. Current…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Xiaoyan Liu , Kangrui Li , Yuehao Song , Jiaxin Liu

FB-4D: Spatial-Temporal Coherent Dynamic 3D Content Generation with Feature Banks

With the rapid advancements in diffusion models and 3D generation techniques, dynamic 3D content generation has become a crucial research area. However, achieving high-fidelity 4D (dynamic 3D) generation with strong spatial-temporal…

Computer Vision and Pattern Recognition · Computer Science 2025-03-27 Jinwei Li , Huan-ang Gao , Wenyi Li , Haohan Chi , Chenyu Liu , Chenxi Du , Yiqian Liu , Mingju Gao , Guiyu Zhang , Zongzheng Zhang , Li Yi , Yao Yao , Jingwei Zhao , Hongyang Li , Yikai Wang , Hao Zhao

MVDream: Multi-view Diffusion for 3D Generation

We introduce MVDream, a diffusion model that is able to generate consistent multi-view images from a given text prompt. Learning from both 2D and 3D data, a multi-view diffusion model can achieve the generalizability of 2D diffusion models…

Computer Vision and Pattern Recognition · Computer Science 2024-04-19 Yichun Shi , Peng Wang , Jianglong Ye , Mai Long , Kejie Li , Xiao Yang

Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models

This paper addresses the challenge of high-fidelity view synthesis of humans with sparse-view videos as input. Previous methods solve the issue of insufficient observation by leveraging 4D diffusion models to generate videos at novel…

Computer Vision and Pattern Recognition · Computer Science 2025-07-18 Yudong Jin , Sida Peng , Xuan Wang , Tao Xie , Zhen Xu , Yifan Yang , Yujun Shen , Hujun Bao , Xiaowei Zhou

DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model

With the increasing popularity of autonomous driving based on the powerful and unified bird's-eye-view (BEV) representation, a demand for high-quality and large-scale multi-view video data with accurate annotation is urgently required.…

Computer Vision and Pattern Recognition · Computer Science 2023-10-13 Xiaofan Li , Yifu Zhang , Xiaoqing Ye

Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion

A recent frontier in computer vision has been the task of 3D video generation, which consists of generating a time-varying 3D representation of a scene. To generate dynamic 3D scenes, current methods explicitly model 3D temporal dynamics by…

Computer Vision and Pattern Recognition · Computer Science 2024-08-01 Rishab Parthasarathy , Zachary Ankner , Aaron Gokaslan

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

While recent foundational video generators produce visually rich output, they still struggle with appearance drift, where objects gradually degrade or change inconsistently across frames, breaking visual coherence. We hypothesize that this…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Hyeonho Jeong , Chun-Hao Paul Huang , Jong Chul Ye , Niloy Mitra , Duygu Ceylan

Video Diffusion Models

Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial…

Computer Vision and Pattern Recognition · Computer Science 2022-06-24 Jonathan Ho , Tim Salimans , Alexey Gritsenko , William Chan , Mohammad Norouzi , David J. Fleet