Related papers: 3D-OES: Viewpoint-Invariant Object-Factorized Envi…

3D Neural Scene Representations for Visuomotor Control

Humans have a strong intuitive understanding of the 3D environment around us. The mental model of the physics in our brain applies to objects of different materials and enables us to perform a wide range of manipulation tasks that are far…

Robotics · Computer Science 2021-11-15 Yunzhu Li , Shuang Li , Vincent Sitzmann , Pulkit Agrawal , Antonio Torralba

3D-IntPhys: Towards More Generalized 3D-grounded Visual Intuitive Physics under Challenging Scenes

Given a visual scene, humans have strong intuitions about how a scene can evolve over time under given actions. The intuition, often termed visual intuitive physics, is a critical ability that allows us to make effective plans to manipulate…

Computer Vision and Pattern Recognition · Computer Science 2023-04-25 Haotian Xue , Antonio Torralba , Joshua B. Tenenbaum , Daniel LK Yamins , Yunzhu Li , Hsiao-Yu Tung

Unsupervised Learning for Physical Interaction through Video Prediction

A core challenge for an agent learning to interact with the world is to predict how its actions affect objects in its environment. Many existing methods for learning the dynamics of physical interactions require labeled object information.…

Machine Learning · Computer Science 2016-10-19 Chelsea Finn , Ian Goodfellow , Sergey Levine

Predicting the Physical Dynamics of Unseen 3D Objects

Machines that can predict the effect of physical interactions on the dynamics of previously unseen object instances are important for creating better robots and interactive virtual worlds. In this work, we focus on predicting the dynamics…

Computer Vision and Pattern Recognition · Computer Science 2020-01-20 Davis Rempe , Srinath Sridhar , He Wang , Leonidas J. Guibas

Learning 3D Persistent Embodied World Models

The ability to simulate the effects of future actions on the world is a crucial ability of intelligent embodied agents, enabling agents to anticipate the effects of their actions and make plans accordingly. While a large body of existing…

Computer Vision and Pattern Recognition · Computer Science 2025-05-12 Siyuan Zhou , Yilun Du , Yuncong Yang , Lei Han , Peihao Chen , Dit-Yan Yeung , Chuang Gan

3D Object Recognition By Corresponding and Quantizing Neural 3D Scene Representations

We propose a system that learns to detect objects and infer their 3D poses in RGB-D images. Many existing systems can identify objects and infer 3D poses, but they heavily rely on human labels and 3D annotations. The challenge here is to…

Computer Vision and Pattern Recognition · Computer Science 2020-11-02 Mihir Prabhudesai , Shamit Lal , Hsiao-Yu Fish Tung , Adam W. Harley , Shubhankar Potdar , Katerina Fragkiadaki

Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling

Videos of robots interacting with objects encode rich information about the objects' dynamics. However, existing video prediction approaches typically do not explicitly account for the 3D information from videos, such as robot actions and…

Robotics · Computer Science 2024-10-25 Mingtong Zhang , Kaifeng Zhang , Yunzhu Li

OCK: Unsupervised Dynamic Video Prediction with Object-Centric Kinematics

Human perception involves decomposing complex multi-object scenes into time-static object appearance (i.e., size, shape, color) and time-varying object motion (i.e., position, velocity, acceleration). For machines to achieve human-like…

Computer Vision and Pattern Recognition · Computer Science 2025-07-22 Yeon-Ji Song , Jaein Kim , Suhyung Choi , Jin-Hwa Kim , Byoung-Tak Zhang

ObjectForesight: Predicting Future 3D Object Trajectories from Human Videos

Humans can effortlessly anticipate how objects might move or change through interaction--imagining a cup being lifted, a knife slicing, or a lid being closed. We aim to endow computational systems with a similar ability to predict plausible…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Rustin Soraki , Homanga Bharadhwaj , Ali Farhadi , Roozbeh Mottaghi

Learning Physical Dynamics for Object-centric Visual Prediction

The ability to model the underlying dynamics of visual scenes and reason about the future is central to human intelligence. Many attempts have been made to empower intelligent systems with such physical understanding and prediction…

Computer Vision and Pattern Recognition · Computer Science 2024-03-18 Huilin Xu , Tao Chen , Feng Xu

Learning Intuitive Physics with Multimodal Generative Models

Predicting the future interaction of objects when they come into contact with their environment is key for autonomous agents to take intelligent and anticipatory actions. This paper presents a perception framework that fuses visual and…

Machine Learning · Computer Science 2021-01-21 Sahand Rezaei-Shoshtari , Francois Robert Hogan , Michael Jenkin , David Meger , Gregory Dudek

Variational Inference for Scalable 3D Object-centric Learning

We tackle the task of scalable unsupervised object-centric representation learning on 3D scenes. Existing approaches to object-centric representation learning show limitations in generalizing to larger scenes as their learning processes…

Computer Vision and Pattern Recognition · Computer Science 2023-09-26 Tianyu Wang , Kee Siong Ng , Miaomiao Liu

Multi-Object Manipulation via Object-Centric Neural Scattering Functions

Learned visual dynamics models have proven effective for robotic manipulation tasks. Yet, it remains unclear how best to represent scenes involving multi-object interactions. Current methods decompose a scene into discrete objects, but they…

Robotics · Computer Science 2023-06-16 Stephen Tian , Yancheng Cai , Hong-Xing Yu , Sergey Zakharov , Katherine Liu , Adrien Gaidon , Yunzhu Li , Jiajun Wu

Kinematics-Guided Reinforcement Learning for Object-Aware 3D Ego-Pose Estimation

We propose a method for incorporating object interaction and human body dynamics into the task of 3D ego-pose estimation using a head-mounted camera. We use a kinematics model of the human body to represent the entire range of human motion,…

Computer Vision and Pattern Recognition · Computer Science 2020-12-10 Zhengyi Luo , Ryo Hachiuma , Ye Yuan , Shun Iwase , Kris M. Kitani

E-Motion: Future Motion Simulation via Event Sequence Diffusion

Forecasting a typical object's future motion is a critical task for interpreting and interacting with dynamic environments in computer vision. Event-based sensors, which could capture changes in the scene with exceptional temporal…

Computer Vision and Pattern Recognition · Computer Science 2024-10-14 Song Wu , Zhiyu Zhu , Junhui Hou , Guangming Shi , Jinjian Wu

Neural Groundplans: Persistent Neural Scene Representations from a Single Image

We present a method to map 2D image observations of a scene to a persistent 3D scene representation, enabling novel view synthesis and disentangled representation of the movable and immovable components of the scene. Motivated by the…

Computer Vision and Pattern Recognition · Computer Science 2023-04-11 Prafull Sharma , Ayush Tewari , Yilun Du , Sergey Zakharov , Rares Ambrus , Adrien Gaidon , William T. Freeman , Fredo Durand , Joshua B. Tenenbaum , Vincent Sitzmann

Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping

We hypothesize that an agent that can look around in static scenes can learn rich visual representations applicable to 3D object tracking in complex dynamic scenes. We are motivated in this pursuit by the fact that the physical world itself…

Computer Vision and Pattern Recognition · Computer Science 2020-08-05 Adam W. Harley , Shrinidhi K. Lakshmikanth , Paul Schydlo , Katerina Fragkiadaki

Conditional Object-Centric Learning from Video

Object-centric representations are a promising path toward more systematic generalization by providing flexible abstractions upon which compositional world models can be built. Recent work on simple 2D and 3D datasets has shown that models…

Computer Vision and Pattern Recognition · Computer Science 2022-03-16 Thomas Kipf , Gamaleldin F. Elsayed , Aravindh Mahendran , Austin Stone , Sara Sabour , Georg Heigold , Rico Jonschkowski , Alexey Dosovitskiy , Klaus Greff

Neural World Models for Computer Vision

Humans navigate in their environment by learning a mental model of the world through passive observation and active interaction. Their world model allows them to anticipate what might happen next and act accordingly with respect to an…

Computer Vision and Pattern Recognition · Computer Science 2023-06-16 Anthony Hu

Neural Implicit Representations for Physical Parameter Inference from a Single Video

Neural networks have recently been used to analyze diverse physical systems and to identify the underlying dynamics. While existing methods achieve impressive results, they are limited by their strong demand for training data and their weak…

Computer Vision and Pattern Recognition · Computer Science 2024-04-03 Florian Hofherr , Lukas Koestler , Florian Bernard , Daniel Cremers