Related papers: Diffusion4D: Fast Spatial-temporal Consistent 4D G…

4Diffusion: Multi-view Video Diffusion Model for 4D Generation

Current 4D generation methods have achieved noteworthy efficacy with the aid of advanced diffusion generative models. However, these methods lack multi-view spatial-temporal modeling and encounter challenges in integrating diverse prior…

Computer Vision and Pattern Recognition · Computer Science 2024-10-23 Haiyu Zhang , Xinyuan Chen , Yaohui Wang , Xihui Liu , Yunhong Wang , Yu Qiao

Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models

Recent advancements in 3D generation are predominantly propelled by improvements in 3D-aware image diffusion models. These models are pretrained on Internet-scale image data and fine-tuned on massive 3D data, offering the capability of…

Computer Vision and Pattern Recognition · Computer Science 2024-10-03 Zeyu Yang , Zijie Pan , Chun Gu , Li Zhang

Dream4D: Lifting Camera-Controlled I2V towards Spatiotemporally Consistent 4D Generation

The synthesis of spatiotemporally coherent 4D content presents fundamental challenges in computer vision, requiring simultaneous modeling of high-fidelity spatial representations and physically plausible temporal dynamics. Current…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Xiaoyan Liu , Kangrui Li , Yuehao Song , Jiaxin Liu

FB-4D: Spatial-Temporal Coherent Dynamic 3D Content Generation with Feature Banks

With the rapid advancements in diffusion models and 3D generation techniques, dynamic 3D content generation has become a crucial research area. However, achieving high-fidelity 4D (dynamic 3D) generation with strong spatial-temporal…

Computer Vision and Pattern Recognition · Computer Science 2025-03-27 Jinwei Li , Huan-ang Gao , Wenyi Li , Haohan Chi , Chenyu Liu , Chenxi Du , Yiqian Liu , Mingju Gao , Guiyu Zhang , Zongzheng Zhang , Li Yi , Yao Yao , Jingwei Zhao , Hongyang Li , Yikai Wang , Hao Zhao

4DVD: Cascaded Dense-view Video Diffusion Model for High-quality 4D Content Generation

Given the high complexity of directly generating high-dimensional data such as 4D, we present 4DVD, a cascaded video diffusion model that generates 4D content in a decoupled manner. Unlike previous multi-view video methods that directly…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Shuzhou Yang , Xiaodong Cun , Xiaoyu Li , Yaowei Li , Jian Zhang

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

We present Stable Video 4D (SV4D), a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation. Unlike previous methods that rely on separately trained generative models for video generation and…

Computer Vision and Pattern Recognition · Computer Science 2025-03-03 Yiming Xie , Chun-Han Yao , Vikram Voleti , Huaizu Jiang , Varun Jampani

SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation

We present Stable Video 4D 2.0 (SV4D 2.0), a multi-view video diffusion model for dynamic 3D asset generation. Compared to its predecessor SV4D, SV4D 2.0 is more robust to occlusions and large motion, generalizes better to real-world…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Chun-Han Yao , Yiming Xie , Vikram Voleti , Huaizu Jiang , Varun Jampani

STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians

Recent progress in pre-trained diffusion models and 3D generation have spurred interest in 4D content creation. However, achieving high-fidelity 4D generation with spatial-temporal consistency remains a challenge. In this work, we propose…

Computer Vision and Pattern Recognition · Computer Science 2024-03-25 Yifei Zeng , Yanqin Jiang , Siyu Zhu , Yuanxun Lu , Youtian Lin , Hao Zhu , Weiming Hu , Xun Cao , Yao Yao

Splat4D: Diffusion-Enhanced 4D Gaussian Splatting for Temporally and Spatially Consistent Content Creation

Generating high-quality 4D content from monocular videos for applications such as digital humans and AR/VR poses challenges in ensuring temporal and spatial consistency, preserving intricate details, and incorporating user guidance…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Minghao Yin , Yukang Cao , Songyou Peng , Kai Han

A Unified Approach for Text- and Image-guided 4D Scene Generation

Large-scale diffusion generative models are greatly simplifying image, video and 3D asset creation from user-provided text prompts and images. However, the challenging problem of text-to-4D dynamic 3D scene generation with diffusion…

Computer Vision and Pattern Recognition · Computer Science 2024-05-08 Yufeng Zheng , Xueting Li , Koki Nagano , Sifei Liu , Karsten Kreis , Otmar Hilliges , Shalini De Mello

Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency

We present Free4D, a novel tuning-free framework for 4D scene generation from a single image. Existing methods either focus on object-level generation, making scene-level generation infeasible, or rely on large-scale multi-view video…

Computer Vision and Pattern Recognition · Computer Science 2025-03-27 Tianqi Liu , Zihao Huang , Zhaoxi Chen , Guangcong Wang , Shoukang Hu , Liao Shen , Huiqiang Sun , Zhiguo Cao , Wei Li , Ziwei Liu

Efficient4D: Fast Dynamic 3D Object Generation from a Single-view Video

Generating dynamic 3D object from a single-view video is challenging due to the lack of 4D labeled data. An intuitive approach is to extend previous image-to-3D pipelines by transferring off-the-shelf image generation models such as score…

Computer Vision and Pattern Recognition · Computer Science 2026-01-30 Zijie Pan , Zeyu Yang , Xiatian Zhu , Li Zhang

Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion

Recent video diffusion models have achieved impressive capabilities as large-scale generative world models. However, these models often struggle with fine-grained physical consistency, exhibiting physically implausible dynamics over time.…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Haoran Lu , Shang Wu , Jianshu Zhang , Maojiang Su , Guo Ye , Chenwei Xu , Lie Lu , Pranav Maneriker , Fan Du , Manling Li , Zhaoran Wang , Han Liu

V3D: Video Diffusion Models are Effective 3D Generators

Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Zilong Chen , Yikai Wang , Feng Wang , Zhengyi Wang , Huaping Liu

Vid3D: Synthesis of Dynamic 3D Scenes using 2D Video Diffusion

A recent frontier in computer vision has been the task of 3D video generation, which consists of generating a time-varying 3D representation of a scene. To generate dynamic 3D scenes, current methods explicitly model 3D temporal dynamics by…

Computer Vision and Pattern Recognition · Computer Science 2024-08-01 Rishab Parthasarathy , Zachary Ankner , Aaron Gokaslan

STP4D: Spatio-Temporal-Prompt Consistent Modeling for Text-to-4D Gaussian Splatting

Text-to-4D generation is rapidly developing and widely applied in various scenarios. However, existing methods often fail to incorporate adequate spatio-temporal modeling and prompt alignment within a unified framework, resulting in…

Computer Vision and Pattern Recognition · Computer Science 2025-04-28 Yunze Deng , Haijun Xiong , Bin Feng , Xinggang Wang , Wenyu Liu

DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation

Current generative models struggle to synthesize dynamic 4D driving scenes that simultaneously support temporal extrapolation and spatial novel view synthesis (NVS) without per-scene optimization. A key challenge lies in finding an…

Computer Vision and Pattern Recognition · Computer Science 2025-03-20 Jiazhe Guo , Yikang Ding , Xiwu Chen , Shuo Chen , Bohan Li , Yingshuang Zou , Xiaoyang Lyu , Feiyang Tan , Xiaojuan Qi , Zhiheng Li , Hao Zhao

Phy124: Fast Physics-Driven 4D Content Generation from a Single Image

4D content generation focuses on creating dynamic 3D objects that change over time. Existing methods primarily rely on pre-trained video diffusion models, utilizing sampling processes or reference videos. However, these approaches face…

Computer Vision and Pattern Recognition · Computer Science 2024-09-12 Jiajing Lin , Zhenzhong Wang , Yongjie Hou , Yuzhou Tang , Min Jiang

Orthogonal Spatial-temporal Distributional Transfer for 4D Generation

In the AIGC era, generating high-quality 4D content has garnered increasing research attention. Unfortunately, current 4D synthesis research is severely constrained by the lack of large-scale 4D datasets, preventing models from adequately…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Wei Liu , Shengqiong Wu , Bobo Li , Haoyu Zhao , Hao Fei , Mong-Li Lee , Wynne Hsu

Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models

Despite having tremendous progress in image-to-3D generation, existing methods still struggle to produce multi-view consistent images with high-resolution textures in detail, especially in the paradigm of 2D diffusion that lacks 3D…

Computer Vision and Pattern Recognition · Computer Science 2024-09-12 Haibo Yang , Yang Chen , Yingwei Pan , Ting Yao , Zhineng Chen , Chong-Wah Ngo , Tao Mei