English
Related papers

Related papers: Diffusion4D: Fast Spatial-temporal Consistent 4D G…

200 papers

Current 4D generation methods have achieved noteworthy efficacy with the aid of advanced diffusion generative models. However, these methods lack multi-view spatial-temporal modeling and encounter challenges in integrating diverse prior…

Computer Vision and Pattern Recognition · Computer Science 2024-10-23 Haiyu Zhang , Xinyuan Chen , Yaohui Wang , Xihui Liu , Yunhong Wang , Yu Qiao

Recent advancements in 3D generation are predominantly propelled by improvements in 3D-aware image diffusion models. These models are pretrained on Internet-scale image data and fine-tuned on massive 3D data, offering the capability of…

Computer Vision and Pattern Recognition · Computer Science 2024-10-03 Zeyu Yang , Zijie Pan , Chun Gu , Li Zhang

The synthesis of spatiotemporally coherent 4D content presents fundamental challenges in computer vision, requiring simultaneous modeling of high-fidelity spatial representations and physically plausible temporal dynamics. Current…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Xiaoyan Liu , Kangrui Li , Yuehao Song , Jiaxin Liu

With the rapid advancements in diffusion models and 3D generation techniques, dynamic 3D content generation has become a crucial research area. However, achieving high-fidelity 4D (dynamic 3D) generation with strong spatial-temporal…

Computer Vision and Pattern Recognition · Computer Science 2025-03-27 Jinwei Li , Huan-ang Gao , Wenyi Li , Haohan Chi , Chenyu Liu , Chenxi Du , Yiqian Liu , Mingju Gao , Guiyu Zhang , Zongzheng Zhang , Li Yi , Yao Yao , Jingwei Zhao , Hongyang Li , Yikai Wang , Hao Zhao

Given the high complexity of directly generating high-dimensional data such as 4D, we present 4DVD, a cascaded video diffusion model that generates 4D content in a decoupled manner. Unlike previous multi-view video methods that directly…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Shuzhou Yang , Xiaodong Cun , Xiaoyu Li , Yaowei Li , Jian Zhang

We present Stable Video 4D (SV4D), a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation. Unlike previous methods that rely on separately trained generative models for video generation and…

Computer Vision and Pattern Recognition · Computer Science 2025-03-03 Yiming Xie , Chun-Han Yao , Vikram Voleti , Huaizu Jiang , Varun Jampani

We present Stable Video 4D 2.0 (SV4D 2.0), a multi-view video diffusion model for dynamic 3D asset generation. Compared to its predecessor SV4D, SV4D 2.0 is more robust to occlusions and large motion, generalizes better to real-world…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Chun-Han Yao , Yiming Xie , Vikram Voleti , Huaizu Jiang , Varun Jampani

Recent progress in pre-trained diffusion models and 3D generation have spurred interest in 4D content creation. However, achieving high-fidelity 4D generation with spatial-temporal consistency remains a challenge. In this work, we propose…

Computer Vision and Pattern Recognition · Computer Science 2024-03-25 Yifei Zeng , Yanqin Jiang , Siyu Zhu , Yuanxun Lu , Youtian Lin , Hao Zhu , Weiming Hu , Xun Cao , Yao Yao

Generating high-quality 4D content from monocular videos for applications such as digital humans and AR/VR poses challenges in ensuring temporal and spatial consistency, preserving intricate details, and incorporating user guidance…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Minghao Yin , Yukang Cao , Songyou Peng , Kai Han

Large-scale diffusion generative models are greatly simplifying image, video and 3D asset creation from user-provided text prompts and images. However, the challenging problem of text-to-4D dynamic 3D scene generation with diffusion…

Computer Vision and Pattern Recognition · Computer Science 2024-05-08 Yufeng Zheng , Xueting Li , Koki Nagano , Sifei Liu , Karsten Kreis , Otmar Hilliges , Shalini De Mello

We present Free4D, a novel tuning-free framework for 4D scene generation from a single image. Existing methods either focus on object-level generation, making scene-level generation infeasible, or rely on large-scale multi-view video…

Computer Vision and Pattern Recognition · Computer Science 2025-03-27 Tianqi Liu , Zihao Huang , Zhaoxi Chen , Guangcong Wang , Shoukang Hu , Liao Shen , Huiqiang Sun , Zhiguo Cao , Wei Li , Ziwei Liu

Generating dynamic 3D object from a single-view video is challenging due to the lack of 4D labeled data. An intuitive approach is to extend previous image-to-3D pipelines by transferring off-the-shelf image generation models such as score…

Computer Vision and Pattern Recognition · Computer Science 2026-01-30 Zijie Pan , Zeyu Yang , Xiatian Zhu , Li Zhang

Recent video diffusion models have achieved impressive capabilities as large-scale generative world models. However, these models often struggle with fine-grained physical consistency, exhibiting physically implausible dynamics over time.…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Haoran Lu , Shang Wu , Jianshu Zhang , Maojiang Su , Guo Ye , Chenwei Xu , Lie Lu , Pranav Maneriker , Fan Du , Manling Li , Zhaoran Wang , Han Liu

Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Zilong Chen , Yikai Wang , Feng Wang , Zhengyi Wang , Huaping Liu

A recent frontier in computer vision has been the task of 3D video generation, which consists of generating a time-varying 3D representation of a scene. To generate dynamic 3D scenes, current methods explicitly model 3D temporal dynamics by…

Computer Vision and Pattern Recognition · Computer Science 2024-08-01 Rishab Parthasarathy , Zachary Ankner , Aaron Gokaslan

Text-to-4D generation is rapidly developing and widely applied in various scenarios. However, existing methods often fail to incorporate adequate spatio-temporal modeling and prompt alignment within a unified framework, resulting in…

Computer Vision and Pattern Recognition · Computer Science 2025-04-28 Yunze Deng , Haijun Xiong , Bin Feng , Xinggang Wang , Wenyu Liu

Current generative models struggle to synthesize dynamic 4D driving scenes that simultaneously support temporal extrapolation and spatial novel view synthesis (NVS) without per-scene optimization. A key challenge lies in finding an…

Computer Vision and Pattern Recognition · Computer Science 2025-03-20 Jiazhe Guo , Yikang Ding , Xiwu Chen , Shuo Chen , Bohan Li , Yingshuang Zou , Xiaoyang Lyu , Feiyang Tan , Xiaojuan Qi , Zhiheng Li , Hao Zhao

4D content generation focuses on creating dynamic 3D objects that change over time. Existing methods primarily rely on pre-trained video diffusion models, utilizing sampling processes or reference videos. However, these approaches face…

Computer Vision and Pattern Recognition · Computer Science 2024-09-12 Jiajing Lin , Zhenzhong Wang , Yongjie Hou , Yuzhou Tang , Min Jiang

In the AIGC era, generating high-quality 4D content has garnered increasing research attention. Unfortunately, current 4D synthesis research is severely constrained by the lack of large-scale 4D datasets, preventing models from adequately…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Wei Liu , Shengqiong Wu , Bobo Li , Haoyu Zhao , Hao Fei , Mong-Li Lee , Wynne Hsu

Despite having tremendous progress in image-to-3D generation, existing methods still struggle to produce multi-view consistent images with high-resolution textures in detail, especially in the paradigm of 2D diffusion that lacks 3D…

Computer Vision and Pattern Recognition · Computer Science 2024-09-12 Haibo Yang , Yang Chen , Yingwei Pan , Ting Yao , Zhineng Chen , Chong-Wah Ngo , Tao Mei
‹ Prev 1 2 3 10 Next ›