English
Related papers

Related papers: Seeing World Dynamics in a Nutshell

200 papers

Understanding dynamic scenes from casual videos is critical for scalable robot learning, yet four-dimensional (4D) reconstruction under strictly monocular settings remains highly ill-posed. To address this challenge, our key insight is that…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Can Li , Jie Gu , Jingmin Chen , Fangzhou Qiu , Lei Sun

Recovering 4D world from monocular video is a crucial yet challenging task. Conventional methods usually rely on the assumptions of multi-view videos, known camera parameters, or static scenes. In this paper, we relax all these constraints…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Shizun Wang , Xingyi Yang , Qiuhong Shen , Zhenxiang Jiang , Xinchao Wang

We propose 4DGT, a 4D Gaussian-based Transformer model for dynamic scene reconstruction, trained entirely on real-world monocular posed videos. Using 4D Gaussian as an inductive bias, 4DGT unifies static and dynamic components, enabling the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Zhen Xu , Zhengqin Li , Zhao Dong , Xiaowei Zhou , Richard Newcombe , Zhaoyang Lv

Gaussian splatting has become a popular representation for novel-view synthesis, exhibiting clear strengths in efficiency, photometric quality, and compositional edibility. Following its success, many works have extended Gaussians to 4D,…

Computer Vision and Pattern Recognition · Computer Science 2024-09-12 Colton Stearns , Adam Harley , Mikaela Uy , Florian Dubost , Federico Tombari , Gordon Wetzstein , Leonidas Guibas

Efficient neural representations for dynamic video scenes are critical for applications ranging from video compression to interactive simulations. Yet, existing methods often face challenges related to high memory usage, lengthy training…

Computer Vision and Pattern Recognition · Computer Science 2025-01-10 Andrew Bond , Jui-Hsien Wang , Long Mai , Erkut Erdem , Aykut Erdem

Building an efficient and physically consistent world model from limited observations is a long standing challenge in vision and robotics. Many existing world modeling pipelines are based on implicit generative models, which are hard to…

Computer Vision and Pattern Recognition · Computer Science 2025-06-06 Wenhao Hu , Xuexiang Wen , Xi Li , Gaoang Wang

Video representation is a long-standing problem that is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing. However, current methods either struggle to model complex motions due…

Computer Vision and Pattern Recognition · Computer Science 2024-06-27 Yang-Tian Sun , Yi-Hua Huang , Lin Ma , Xiaoyang Lyu , Yan-Pei Cao , Xiaojuan Qi

In this paper, we propose MoDGS, a new pipeline to render novel views of dy namic scenes from a casually captured monocular video. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid move ment of input…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Qingming Liu , Yuan Liu , Jiepeng Wang , Xianqiang Lyv , Peng Wang , Wenping Wang , Junhui Hou

Personalized 3D avatars require an animatable representation of digital humans. Doing so instantly from monocular videos offers scalability to broad class of users and wide-scale applications. In this paper, we present a fast, simple, yet…

Computer Vision and Pattern Recognition · Computer Science 2024-07-17 Pramish Paudel , Anubhav Khanal , Ajad Chhatkuli , Danda Pani Paudel , Jyoti Tandukar

Reconstructing dynamic 3D scenes from monocular videos is a fundamental yet highly challenging task, as real-world motions often involve both long-term smooth transformations and short-term complex deformations. Existing methods either…

Computer Vision and Pattern Recognition · Computer Science 2026-05-25 Chenyu Wu , Wanhua Li , Zhu-Tian Chen , Hanspeter Pfister

We introduce a novel framework for reconstructing dynamic human-object interactions from monocular video that overcomes challenges associated with occlusions and temporal inconsistencies. Traditional 3D reconstruction methods typically…

Computer Vision and Pattern Recognition · Computer Science 2025-09-16 Hyungjun Doh , Dong In Lee , Seunggeun Chi , Pin-Hao Huang , Kwonjoon Lee , Sangpil Kim , Karthik Ramani

This paper addresses the challenge of novel-view synthesis and motion reconstruction of dynamic scenes from monocular video, which is critical for many robotic applications. Although Neural Radiance Fields (NeRF) and 3D Gaussian Splatting…

Robotics · Computer Science 2025-08-12 Xuesong Li , Lars Petersson , Vivien Rolland

In this paper, we present a method to reconstruct the world and multiple dynamic humans in 3D from a monocular video input. As a key idea, we represent both the world and multiple humans via the recently emerging 3D Gaussian Splatting…

Computer Vision and Pattern Recognition · Computer Science 2024-04-23 Inhee Lee , Byungjun Kim , Hanbyul Joo

Humans excel at forecasting the future dynamics of a scene given just a single image. Video generation models that can mimic this ability are an essential component for intelligent systems. Recent approaches have improved temporal coherence…

Computer Vision and Pattern Recognition · Computer Science 2026-05-18 Melonie de Almeida , Daniela Ivanova , Tong Shi , John H. Williamson , Paul Henderson

This paper aims to address the challenge of reconstructing long volumetric videos from multi-view RGB videos. Recent dynamic view synthesis methods leverage powerful 4D representations, like feature grids or point cloud sequences, to…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Zhen Xu , Yinghao Xu , Zhiyuan Yu , Sida Peng , Jiaming Sun , Hujun Bao , Xiaowei Zhou

Monocular 3D tracking aims to capture the long-term motion of pixels in 3D space from a single monocular video and has witnessed rapid progress in recent years. However, we argue that the existing monocular 3D tracking methods still fall…

Computer Vision and Pattern Recognition · Computer Science 2025-12-10 Jiahao Lu , Weitao Xiong , Jiacheng Deng , Peng Li , Tianyu Huang , Zhiyang Dou , Cheng Lin , Sai-Kit Yeung , Yuan Liu

Volumetric video represents a transformative advancement in visual media, enabling users to freely navigate immersive virtual experiences and narrowing the gap between digital and real worlds. However, the need for extensive manual…

Graphics · Computer Science 2024-09-16 Yuheng Jiang , Zhehao Shen , Yu Hong , Chengcheng Guo , Yize Wu , Yingliang Zhang , Jingyi Yu , Lan Xu

Visual understanding of the world goes beyond the semantics and flat structure of individual images. In this work, we aim to capture both the 3D structure and dynamics of real-world scenes from monocular real-world videos. Our Dynamic Scene…

Computer Vision and Pattern Recognition · Computer Science 2024-03-18 Maximilian Seitzer , Sjoerd van Steenkiste , Thomas Kipf , Klaus Greff , Mehdi S. M. Sajjadi

To achieve realistic immersion in landscape images, fluids such as water and clouds need to move within the image while revealing new scenes from various camera perspectives. Recently, a field called dynamic scene video has emerged, which…

Computer Vision and Pattern Recognition · Computer Science 2025-04-09 In-Hwan Jin , Haesoo Choo , Seong-Hun Jeong , Heemoon Park , Junghwan Kim , Oh-joon Kwon , Kyeongbo Kong

Estimating the 3D trajectory of every pixel from a monocular video is crucial and promising for a comprehensive understanding of the 3D dynamics of videos. Recent monocular 3D tracking works demonstrate impressive performance, but are…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Jiahao Lu , Jiayi Xu , Wenbo Hu , Ruijie Zhu , Chengfeng Zhao , Sai-Kit Yeung , Ying Shan , Yuan Liu
‹ Prev 1 2 3 10 Next ›