Related papers: Experience-Embedded Visual Foresight

Visual Perception Generalization for Vision-and-Language Navigation via Meta-Learning

Vision-and-language navigation (VLN) is a challenging task that requires an agent to navigate in real-world environments by understanding natural language instructions and visual information received in real-time. Prior works have…

Robotics · Computer Science 2021-01-20 Ting Wang , Zongkai Wu , Donglin Wang

Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL

Embodied visual tracking is to follow a target object in dynamic 3D environments using an agent's egocentric vision. This is a vital and challenging skill for embodied agents. However, existing methods suffer from inefficient training and…

Computer Vision and Pattern Recognition · Computer Science 2024-07-23 Fangwei Zhong , Kui Wu , Hai Ci , Churan Wang , Hao Chen

MaskViT: Masked Visual Pre-Training for Video Prediction

The ability to predict future visual observations conditioned on past observations and motor commands can enable embodied agents to plan solutions to a variety of tasks in complex environments. This work shows that we can create good video…

Computer Vision and Pattern Recognition · Computer Science 2022-08-09 Agrim Gupta , Stephen Tian , Yunzhi Zhang , Jiajun Wu , Roberto Martín-Martín , Li Fei-Fei

Training and Predicting Visual Error for Real-Time Applications

Visual error metrics play a fundamental role in the quantification of perceived image similarity. Most recently, use cases for them in real-time applications have emerged, such as content-adaptive shading and shading reuse to increase…

Graphics · Computer Science 2023-10-16 João Libório Cardoso , Bernhard Kerbl , Lei Yang , Yury Uralsky , Michael Wimmer

The Empirical Impact of Forgetting and Transfer in Continual Visual Odometry

As robotics continues to advance, the need for adaptive and continuously-learning embodied agents increases, particularly in the realm of assistance robotics. Quick adaptability and long-term information retention are essential to operate…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Paolo Cudrano , Xiaoyu Luo , Matteo Matteucci

Visual Grounding of Learned Physical Models

Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions. The abilities to perform physical reasoning and to adapt to new environments, while intrinsic…

Machine Learning · Computer Science 2020-06-30 Yunzhu Li , Toru Lin , Kexin Yi , Daniel M. Bear , Daniel L. K. Yamins , Jiajun Wu , Joshua B. Tenenbaum , Antonio Torralba

Embodied vision for learning object representations

Recent time-contrastive learning approaches manage to learn invariant object representations without supervision. This is achieved by mapping successive views of an object onto close-by internal representations. When considering this…

Machine Learning · Computer Science 2022-05-13 Arthur Aubret , Céline Teulière , Jochen Triesch

EVT: Efficient View Transformation for Multi-Modal 3D Object Detection

Multi-modal sensor fusion in Bird's Eye View (BEV) representation has become the leading approach for 3D object detection. However, existing methods often rely on depth estimators or transformer encoders to transform image features into BEV…

Computer Vision and Pattern Recognition · Computer Science 2025-07-14 Yongjin Lee , Hyeon-Mun Jeong , Yurim Jeon , Sanghyun Kim

Visual Forecasting as a Mid-level Representation for Avoidance

The challenge of navigation in environments with dynamic objects continues to be a central issue in the study of autonomous agents. While predictive methods hold promise, their reliance on precise state information makes them less practical…

Robotics · Computer Science 2024-10-28 Hsuan-Kung Yang , Tsung-Chih Chiang , Ting-Ru Liu , Chun-Wei Huang , Jou-Min Liu , Chun-Yi Lee

Embodied Visual Recognition

Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded. In contrast, humans and other embodied agents have the ability to move in the environment, and actively control the viewing…

Computer Vision and Pattern Recognition · Computer Science 2019-04-10 Jianwei Yang , Zhile Ren , Mingze Xu , Xinlei Chen , David Crandall , Devi Parikh , Dhruv Batra

Visual Interaction Networks

From just a glance, humans can make rich predictions about the future state of a wide range of physical systems. On the other hand, modern approaches from engineering, robotics, and graphics are often restricted to narrow domains and…

Computer Vision and Pattern Recognition · Computer Science 2017-06-06 Nicholas Watters , Andrea Tacchetti , Theophane Weber , Razvan Pascanu , Peter Battaglia , Daniel Zoran

Learning Answer Embeddings for Visual Question Answering

We propose a novel probabilistic model for visual question answering (Visual QA). The key idea is to infer two sets of embeddings: one for the image and the question jointly and the other for the answers. The learning objective is to learn…

Computer Vision and Pattern Recognition · Computer Science 2018-06-12 Hexiang Hu , Wei-Lun Chao , Fei Sha

Envision: Embodied Visual Planning via Goal-Imagery Video Diffusion

Embodied visual planning aims to enable manipulation tasks by imagining how a scene evolves toward a desired goal and using the imagined trajectories to guide actions. Video diffusion models, through their image-to-video generation…

Computer Vision and Pattern Recognition · Computer Science 2025-12-30 Yuming Gu , Yizhi Wang , Yining Hong , Yipeng Gao , Hao Jiang , Angtian Wang , Bo Liu , Nathaniel S. Dennler , Zhengfei Kuang , Hao Li , Gordon Wetzstein , Chongyang Ma

Subject Adaptive EEG-based Visual Recognition

This paper focuses on EEG-based visual recognition, aiming to predict the visual object class observed by a subject based on his/her EEG signals. One of the main challenges is the large variation between signals from different subjects. It…

Computer Vision and Pattern Recognition · Computer Science 2021-10-27 Pilhyeon Lee , Sunhee Hwang , Seogkyu Jeon , Hyeran Byun

E-Motion: Future Motion Simulation via Event Sequence Diffusion

Forecasting a typical object's future motion is a critical task for interpreting and interacting with dynamic environments in computer vision. Event-based sensors, which could capture changes in the scene with exceptional temporal…

Computer Vision and Pattern Recognition · Computer Science 2024-10-14 Song Wu , Zhiyu Zhu , Junhui Hou , Guangming Shi , Jinjian Wu

Unsupervised Learning for Physical Interaction through Video Prediction

A core challenge for an agent learning to interact with the world is to predict how its actions affect objects in its environment. Many existing methods for learning the dynamics of physical interactions require labeled object information.…

Machine Learning · Computer Science 2016-10-19 Chelsea Finn , Ian Goodfellow , Sergey Levine

Imitation Learning-based Visual Servoing for Tracking Moving Objects

In everyday life collaboration tasks between human operators and robots, the former necessitate simple ways for programming new skills, the latter have to show adaptive capabilities to cope with environmental changes. The joint use of…

Robotics · Computer Science 2023-09-15 Rocco Felici , Matteo Saveriano , Loris Roveda , Antonio Paolillo

FoV-Net: Field-of-View Extrapolation Using Self-Attention and Uncertainty

The ability to make educated predictions about their surroundings, and associate them with certain confidence, is important for intelligent systems, like autonomous vehicles and robots. It allows them to plan early and decide accordingly.…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Liqian Ma , Stamatios Georgoulis , Xu Jia , Luc Van Gool

ViT-VS: On the Applicability of Pretrained Vision Transformer Features for Generalizable Visual Servoing

Visual servoing enables robots to precisely position their end-effector relative to a target object. While classical methods rely on hand-crafted features and thus are universally applicable without task-specific training, they often…

Robotics · Computer Science 2025-07-14 Alessandro Scherl , Stefan Thalhammer , Bernhard Neuberger , Wilfried Wöber , José García-Rodríguez

Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions

Common-sense physical reasoning is an essential ingredient for any intelligent agent operating in the real-world. For example, it can be used to simulate the environment, or to infer the state of parts of the world that are currently…

Machine Learning · Computer Science 2018-03-01 Sjoerd van Steenkiste , Michael Chang , Klaus Greff , Jürgen Schmidhuber