Related papers: Can Video Diffusion Model Reconstruct 4D Geometry?

Shape of Motion: 4D Reconstruction from a Single Video

Monocular dynamic reconstruction is a challenging and long-standing vision problem due to the highly ill-posed nature of the task. Existing approaches depend on templates, are effective only in quasi-static scenes, or fail to model 3D…

Computer Vision and Pattern Recognition · Computer Science 2025-10-17 Qianqian Wang , Vickie Ye , Hang Gao , Weijia Zeng , Jake Austin , Zhengqi Li , Angjoo Kanazawa

4D3R: Motion-Aware Neural Reconstruction and Rendering of Dynamic Scenes from Monocular Videos

Novel view synthesis from monocular videos of dynamic scenes with unknown camera poses remains a fundamental challenge in computer vision and graphics. While recent advances in 3D representations such as Neural Radiance Fields (NeRF) and 3D…

Computer Vision and Pattern Recognition · Computer Science 2025-11-10 Mengqi Guo , Bo Xu , Yanyan Li , Gim Hee Lee

Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction

We introduce Geo4D, a method to repurpose video diffusion models for monocular 3D reconstruction of dynamic scenes. By leveraging the strong dynamic priors captured by large-scale pre-trained video models, Geo4D can be trained using only…

Computer Vision and Pattern Recognition · Computer Science 2025-08-20 Zeren Jiang , Chuanxia Zheng , Iro Laina , Diane Larlus , Andrea Vedaldi

VideoFrom3D: 3D Scene Video Generation via Complementary Image and Video Diffusion Models

In this paper, we propose VideoFrom3D, a novel framework for synthesizing high-quality 3D scene videos from coarse geometry, a camera trajectory, and a reference image. Our approach streamlines the 3D graphic design workflow, enabling…

Graphics · Computer Science 2025-09-23 Geonung Kim , Janghyeok Han , Sunghyun Cho

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Advancements in 3D scene reconstruction have transformed 2D images from the real world into 3D models, producing realistic 3D results from hundreds of input photos. Despite great success in dense-view reconstruction scenarios, rendering a…

Computer Vision and Pattern Recognition · Computer Science 2025-06-26 Fangfu Liu , Wenqiang Sun , Hanyang Wang , Yikai Wang , Haowen Sun , Junliang Ye , Jun Zhang , Yueqi Duan

VS3R: Robust Full-frame Video Stabilization via Deep 3D Reconstruction

Video stabilization aims to mitigate camera shake but faces a fundamental trade-off between geometric robustness and full-frame consistency. While 2D methods suffer from aggressive cropping, 3D techniques are often undermined by fragile…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Muhua Zhu , Xinhao Jin , Yu Zhang , Yifei Xue , Tie Ji , Yizhen Lao

Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation

Real-world applications like video gaming and virtual reality often demand the ability to model 3D scenes that users can explore along custom camera trajectories. While significant progress has been made in generating 3D objects from text…

Computer Vision and Pattern Recognition · Computer Science 2025-06-05 Tianyu Huang , Wangguandong Zheng , Tengfei Wang , Yuhao Liu , Zhenwei Wang , Junta Wu , Jie Jiang , Hui Li , Rynson W. H. Lau , Wangmeng Zuo , Chunchao Guo

Driv3R: Learning Dense 4D Reconstruction for Autonomous Driving

Realtime 4D reconstruction for dynamic scenes remains a crucial challenge for autonomous driving perception. Most existing methods rely on depth estimation through self-supervision or multi-modality sensor fusion. In this paper, we propose…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Xin Fei , Wenzhao Zheng , Yueqi Duan , Wei Zhan , Masayoshi Tomizuka , Kurt Keutzer , Jiwen Lu

PAS3R: Pose-Adaptive Streaming 3D Reconstruction for Long Video Sequences

Online monocular 3D reconstruction enables dense scene recovery from streaming video but remains fundamentally limited by the stability-adaptation dilemma: the reconstruction model must rapidly incorporate novel viewpoints while preserving…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Lanbo Xu , Liang Guo , Caigui Jiang , Cheng Wang

Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

Generating interactive and dynamic 4D scenes from a single static image remains a core challenge. Most existing generate-then-reconstruct and reconstruct-then-generate methods decouple geometry from motion, causing spatiotemporal…

Computer Vision and Pattern Recognition · Computer Science 2025-12-05 Yanran Zhang , Ziyi Wang , Wenzhao Zheng , Zheng Zhu , Jie Zhou , Jiwen Lu

MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion

The spatio-temporal complexity of video data presents significant challenges in tasks such as compression, generation, and inpainting. We present four key contributions to address the challenges of spatiotemporal video processing. First, we…

Computer Vision and Pattern Recognition · Computer Science 2025-03-12 Onkar Susladkar , Jishu Sen Gupta , Chirag Sehgal , Sparsh Mittal , Rekha Singhal

Sora Generates Videos with Stunning Geometrical Consistency

The recently developed Sora model [1] has exhibited remarkable capabilities in video generation, sparking intense discussions regarding its ability to simulate real-world phenomena. Despite its growing popularity, there is a lack of…

Computer Vision and Pattern Recognition · Computer Science 2024-02-28 Xuanyi Li , Daquan Zhou , Chenxu Zhang , Shaodong Wei , Qibin Hou , Ming-Ming Cheng

MoRe: Motion-aware Feed-forward 4D Reconstruction Transformer

Reconstructing dynamic 4D scenes remains challenging due to the presence of moving objects that corrupt camera pose estimation. Existing optimization methods alleviate this issue with additional supervision, but they are mostly…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Juntong Fang , Zequn Chen , Weiqi Zhang , Donglin Di , Xuancheng Zhang , Chengmin Yang , Yu-Shen Liu

NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction

We present NOVA3R, an effective approach for non-pixel-aligned 3D reconstruction from a set of unposed images in a feed-forward manner. Unlike pixel-aligned methods that tie geometry to per-ray predictions, our formulation learns a global,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Weirong Chen , Chuanxia Zheng , Ganlin Zhang , Andrea Vedaldi , Daniel Cremers

MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion

Estimating geometry from dynamic scenes, where objects move and deform over time, remains a core challenge in computer vision. Current approaches often rely on multi-stage pipelines or global optimizations that decompose the problem into…

Computer Vision and Pattern Recognition · Computer Science 2025-05-09 Junyi Zhang , Charles Herrmann , Junhwa Hur , Varun Jampani , Trevor Darrell , Forrester Cole , Deqing Sun , Ming-Hsuan Yang

PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos

We present PAD3R, a method for reconstructing deformable 3D objects from casually captured, unposed monocular videos. Unlike existing approaches, PAD3R handles long video sequences featuring substantial object deformation, large-scale…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Ting-Hsuan Liao , Haowen Liu , Yiran Xu , Songwei Ge , Gengshan Yang , Jia-Bin Huang

ViDAR: Video Diffusion-Aware 4D Reconstruction From Monocular Inputs

Dynamic Novel View Synthesis aims to generate photorealistic views of moving subjects from arbitrary viewpoints. This task is particularly challenging when relying on monocular video, where disentangling structure from motion is ill-posed…

Computer Vision and Pattern Recognition · Computer Science 2025-06-24 Michal Nazarczuk , Sibi Catley-Chandar , Thomas Tanay , Zhensong Zhang , Gregory Slabaugh , Eduardo Pérez-Pellitero

Flow4R: Unifying 4D Reconstruction and Tracking with Scene Flow

Reconstructing and tracking dynamic 3D scenes remains a fundamental challenge in computer vision. Existing approaches often decouple geometry from motion: multi-view reconstruction methods assume static scenes, while dynamic tracking…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Shenhan Qian , Ganlin Zhang , Shangzhe Wu , Daniel Cremers

InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion

Recent advances in diffusion-based video generation have opened new possibilities for controllable video editing, yet realistic video object insertion (VOI) remains challenging due to limited 4D scene understanding and inadequate handling…

Computer Vision and Pattern Recognition · Computer Science 2025-12-22 Hoiyeong Jin , Hyojin Jang , Jeongho Kim , Junha Hyung , Kinam Kim , Dongjin Kim , Huijin Choi , Hyeonji Kim , Jaegul Choo

FreeOrbit4D: Training-Free Arbitrary Camera Redirection for Monocular Videos via Foreground-Complete 4D Reconstruction

Camera redirection aims to replay a dynamic scene from a single monocular video under a user-specified camera trajectory. However, large-angle redirection is inherently ill-posed: a monocular video captures only a narrow spatio-temporal…

Computer Vision and Pattern Recognition · Computer Science 2026-05-20 Wei Cao , Hao Zhang , Fengrui Tian , Yulun Wu , Yingying Li , Shenlong Wang , Ning Yu , Yaoyao Liu