Related papers: Seeing World Dynamics in a Nutshell

Gaussian Sequences with Multi-Scale Dynamics for 4D Reconstruction from Monocular Casual Videos

Understanding dynamic scenes from casual videos is critical for scalable robot learning, yet four-dimensional (4D) reconstruction under strictly monocular settings remains highly ill-posed. To address this challenge, our key insight is that…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Can Li , Jie Gu , Jingmin Chen , Fangzhou Qiu , Lei Sun

GFlow: Recovering 4D World from Monocular Video

Recovering 4D world from monocular video is a crucial yet challenging task. Conventional methods usually rely on the assumptions of multi-view videos, known camera parameters, or static scenes. In this paper, we relax all these constraints…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Shizun Wang , Xingyi Yang , Qiuhong Shen , Zhenxiang Jiang , Xinchao Wang

4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos

We propose 4DGT, a 4D Gaussian-based Transformer model for dynamic scene reconstruction, trained entirely on real-world monocular posed videos. Using 4D Gaussian as an inductive bias, 4DGT unifies static and dynamic components, enabling the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Zhen Xu , Zhengqin Li , Zhao Dong , Xiaowei Zhou , Richard Newcombe , Zhaoyang Lv

Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos

Gaussian splatting has become a popular representation for novel-view synthesis, exhibiting clear strengths in efficiency, photometric quality, and compositional edibility. Following its success, many works have extended Gaussians to 4D,…

Computer Vision and Pattern Recognition · Computer Science 2024-09-12 Colton Stearns , Adam Harley , Mikaela Uy , Florian Dubost , Federico Tombari , Gordon Wetzstein , Leonidas Guibas

GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting

Efficient neural representations for dynamic video scenes are critical for applications ranging from video compression to interactive simulations. Yet, existing methods often face challenges related to high memory usage, lengthy training…

Computer Vision and Pattern Recognition · Computer Science 2025-01-10 Andrew Bond , Jui-Hsien Wang , Long Mai , Erkut Erdem , Aykut Erdem

DSG-World: Learning a 3D Gaussian World Model from Dual State Videos

Building an efficient and physically consistent world model from limited observations is a long standing challenge in vision and robotics. Many existing world modeling pipelines are based on implicit generative models, which are hard to…

Computer Vision and Pattern Recognition · Computer Science 2025-06-06 Wenhao Hu , Xuexiang Wen , Xi Li , Gaoang Wang

Splatter a Video: Video Gaussian Representation for Versatile Processing

Video representation is a long-standing problem that is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing. However, current methods either struggle to model complex motions due…

Computer Vision and Pattern Recognition · Computer Science 2024-06-27 Yang-Tian Sun , Yi-Hua Huang , Lin Ma , Xiaoyang Lyu , Yan-Pei Cao , Xiaojuan Qi

MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos with Depth Priors

In this paper, we propose MoDGS, a new pipeline to render novel views of dy namic scenes from a casually captured monocular video. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid move ment of input…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Qingming Liu , Yuan Liu , Jiepeng Wang , Xianqiang Lyv , Peng Wang , Wenping Wang , Junhui Hou

iHuman: Instant Animatable Digital Humans From Monocular Videos

Personalized 3D avatars require an animatable representation of digital humans. Doing so instantly from monocular videos offers scalability to broad class of users and wide-scale applications. In this paper, we present a fast, simple, yet…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Pramish Paudel , Anubhav Khanal , Ajad Chhatkuli , Danda Pani Paudel , Jyoti Tandukar

RiGS: Rigid-aware 4D Gaussian Splatting from a Single Monocular Video

Reconstructing dynamic 3D scenes from monocular videos is a fundamental yet highly challenging task, as real-world motions often involve both long-term smooth transformations and short-term complex deformations. Existing methods either…

Computer Vision and Pattern Recognition · Computer Science 2026-05-25 Chenyu Wu , Wanhua Li , Zhu-Tian Chen , Hanspeter Pfister

Occlusion-Aware Temporally Consistent Amodal Completion for 3D Human-Object Interaction Reconstruction

We introduce a novel framework for reconstructing dynamic human-object interactions from monocular video that overcomes challenges associated with occlusions and temporal inconsistencies. Traditional 3D reconstruction methods typically…

Computer Vision and Pattern Recognition · Computer Science 2025-09-16 Hyungjun Doh , Dong In Lee , Seunggeun Chi , Pin-Hao Huang , Kwonjoon Lee , Sangpil Kim , Karthik Ramani

3D Gaussian Representations with Motion Trajectory Field for Dynamic Scene Reconstruction

This paper addresses the challenge of novel-view synthesis and motion reconstruction of dynamic scenes from monocular video, which is critical for many robotic applications. Although Neural Radiance Fields (NeRF) and 3D Gaussian Splatting…

Robotics · Computer Science 2025-08-12 Xuesong Li , Lars Petersson , Vivien Rolland

Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses

In this paper, we present a method to reconstruct the world and multiple dynamic humans in 3D from a monocular video input. As a key idea, we represent both the world and multiple humans via the recently emerging 3D Gaussian Splatting…

Computer Vision and Pattern Recognition · Computer Science 2024-04-23 Inhee Lee , Byungjun Kim , Hanbyul Joo

Pixel-to-4D: Camera-Controlled Image-to-Video Generation with Dynamic 3D Gaussians

Humans excel at forecasting the future dynamics of a scene given just a single image. Video generation models that can mimic this ability are an essential component for intelligent systems. Recent approaches have improved temporal coherence…

Computer Vision and Pattern Recognition · Computer Science 2026-05-18 Melonie de Almeida , Daniela Ivanova , Tong Shi , John H. Williamson , Paul Henderson

Representing Long Volumetric Video with Temporal Gaussian Hierarchy

This paper aims to address the challenge of reconstructing long volumetric videos from multi-view RGB videos. Recent dynamic view synthesis methods leverage powerful 4D representations, like feature grids or point cloud sequences, to…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Zhen Xu , Yinghao Xu , Zhiyuan Yu , Sida Peng , Jiaming Sun , Hujun Bao , Xiaowei Zhou

TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels

Monocular 3D tracking aims to capture the long-term motion of pixels in 3D space from a single monocular video and has witnessed rapid progress in recent years. However, we argue that the existing monocular 3D tracking methods still fall…

Computer Vision and Pattern Recognition · Computer Science 2025-12-10 Jiahao Lu , Weitao Xiong , Jiacheng Deng , Peng Li , Tianyu Huang , Zhiyang Dou , Cheng Lin , Sai-Kit Yeung , Yuan Liu

Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos

Volumetric video represents a transformative advancement in visual media, enabling users to freely navigate immersive virtual experiences and narrowing the gap between digital and real worlds. However, the need for extensive manual…

Graphics · Computer Science 2024-09-16 Yuheng Jiang , Zhehao Shen , Yu Hong , Chengcheng Guo , Yize Wu , Yingliang Zhang , Jingyi Yu , Lan Xu

DyST: Towards Dynamic Neural Scene Representations on Real-World Videos

Visual understanding of the world goes beyond the semantics and flat structure of individual images. In this work, we aim to capture both the 3D structure and dynamics of real-world scenes from monocular real-world videos. Our Dynamic Scene…

Computer Vision and Pattern Recognition · Computer Science 2024-03-18 Maximilian Seitzer , Sjoerd van Steenkiste , Thomas Kipf , Klaus Greff , Mehdi S. M. Sajjadi

Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images

To achieve realistic immersion in landscape images, fluids such as water and clouds need to move within the image while revealing new scenes from various camera perspectives. Recently, a field called dynamic scene video has emerged, which…

Computer Vision and Pattern Recognition · Computer Science 2025-04-09 In-Hwan Jin , Haesoo Choo , Seong-Hun Jeong , Heemoon Park , Junghwan Kim , Oh-joon Kwon , Kyeongbo Kong

Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels

Estimating the 3D trajectory of every pixel from a monocular video is crucial and promising for a comprehensive understanding of the 3D dynamics of videos. Recent monocular 3D tracking works demonstrate impressive performance, but are…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Jiahao Lu , Jiayi Xu , Wenbo Hu , Ruijie Zhu , Chengfeng Zhao , Sai-Kit Yeung , Ying Shan , Yuan Liu