Related papers: Environment Predictive Coding for Embodied Agents

Explore and Explain: Self-supervised Navigation and Recounting

Embodied AI has been recently gaining attention as it aims to foster the development of autonomous and intelligent agents. In this paper, we devise a novel embodied setting in which an agent needs to explore a previously unknown environment…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Roberto Bigazzi , Federico Landi , Marcella Cornia , Silvia Cascianelli , Lorenzo Baraldi , Rita Cucchiara

Embodied Visual Active Learning for Semantic Segmentation

We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding by actively selecting views for which to request annotation. While accurate on some…

Computer Vision and Pattern Recognition · Computer Science 2020-12-18 David Nilsson , Aleksis Pirinen , Erik Gärtner , Cristian Sminchisescu

Towards Embodied Scene Description

Embodiment is an important characteristic for all intelligent agents (creatures and robots), while existing scene description tasks mainly focus on analyzing images passively and the semantic understanding of the scenario is separated from…

Robotics · Computer Science 2020-05-08 Sinan Tan , Huaping Liu , Di Guo , Xinyu Zhang , Fuchun Sun

Embodied Image Captioning: Self-supervised Learning Agents for Spatially Coherent Image Descriptions

We present a self-supervised method to improve an agent's abilities in describing arbitrary objects while actively exploring a generic environment. This is a challenging problem, as current models struggle to obtain coherent image captions…

Computer Vision and Pattern Recognition · Computer Science 2025-09-18 Tommaso Galliena , Tommaso Apicella , Stefano Rosa , Pietro Morerio , Alessio Del Bue , Lorenzo Natale

Embodied Agents for Efficient Exploration and Smart Scene Description

The development of embodied agents that can communicate with humans in natural language has gained increasing interest over the last years, as it facilitates the diffusion of robotic platforms in human-populated environments. As a step…

Robotics · Computer Science 2024-04-16 Roberto Bigazzi , Marcella Cornia , Silvia Cascianelli , Lorenzo Baraldi , Rita Cucchiara

Embodied Visual Recognition

Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded. In contrast, humans and other embodied agents have the ability to move in the environment, and actively control the viewing…

Computer Vision and Pattern Recognition · Computer Science 2019-04-10 Jianwei Yang , Zhile Ren , Mingze Xu , Xinlei Chen , David Crandall , Devi Parikh , Dhruv Batra

Temporal Predictive Coding For Model-Based Planning In Latent Space

High-dimensional observations are a major challenge in the application of model-based reinforcement learning (MBRL) to real-world environments. To handle high-dimensional sensory inputs, existing approaches use representation learning to…

Machine Learning · Computer Science 2021-06-15 Tung Nguyen , Rui Shu , Tuan Pham , Hung Bui , Stefano Ermon

Learning Continuous Environment Fields via Implicit Functions

We propose a novel scene representation that encodes reaching distance -- the distance between any position in the scene to a goal along a feasible trajectory. We demonstrate that this environment field representation can directly guide the…

Computer Vision and Pattern Recognition · Computer Science 2021-11-30 Xueting Li , Shalini De Mello , Xiaolong Wang , Ming-Hsuan Yang , Jan Kautz , Sifei Liu

Active Sensing with Predictive Coding and Uncertainty Minimization

We present an end-to-end procedure for embodied exploration inspired by two biological computations: predictive coding and uncertainty minimization. The procedure can be applied to exploration settings in a task-independent and…

Machine Learning · Computer Science 2024-02-14 Abdelrahman Sharafeldin , Nabil Imam , Hannah Choi

Probing Emergent Semantics in Predictive Agents via Question Answering

Recent work has shown how predictive modeling can endow agents with rich knowledge of their surroundings, improving their ability to act in complex environments. We propose question-answering as a general paradigm to decode and understand…

Artificial Intelligence · Computer Science 2020-06-02 Abhishek Das , Federico Carnevale , Hamza Merzic , Laura Rimell , Rosalia Schneider , Josh Abramson , Alden Hung , Arun Ahuja , Stephen Clark , Gregory Wayne , Felix Hill

Learning Affordance Landscapes for Interaction Exploration in 3D Environments

Embodied agents operating in human spaces must be able to master how their environment works: what objects can the agent use, and how can it use them? We introduce a reinforcement learning approach for exploration for interaction, whereby…

Computer Vision and Pattern Recognition · Computer Science 2020-10-20 Tushar Nagarajan , Kristen Grauman

Embodied Active Domain Adaptation for Semantic Segmentation via Informative Path Planning

This work presents an embodied agent that can adapt its semantic segmentation network to new indoor environments in a fully autonomous way. Because semantic segmentation networks fail to generalize well to unseen environments, the agent…

Robotics · Computer Science 2022-07-05 René Zurbrügg , Hermann Blum , Cesar Cadena , Roland Siegwart , Lukas Schmid

Learning to Explore Informative Trajectories and Samples for Embodied Perception

We are witnessing significant progress on perception models, specifically those trained on large-scale internet images. However, efficiently generalizing these perception models to unseen embodied tasks is insufficiently studied, which will…

Robotics · Computer Science 2023-03-21 Ya Jing , Tao Kong

Automated mapping of virtual environments with visual predictive coding

Humans construct internal cognitive maps of their environment directly from sensory inputs without access to a system of explicit coordinates or distance measurements. While machine learning algorithms like SLAM utilize specialized visual…

Neurons and Cognition · Quantitative Biology 2024-04-19 James Gornet , Matthew Thomson

An Exploration of Embodied Visual Exploration

Embodied computer vision considers perception for robots in novel, unstructured environments. Of particular importance is the embodied visual exploration problem: how might a robot equipped with a camera scope out a new environment? Despite…

Computer Vision and Pattern Recognition · Computer Science 2020-08-24 Santhosh K. Ramakrishnan , Dinesh Jayaraman , Kristen Grauman

Learning 3D Persistent Embodied World Models

The ability to simulate the effects of future actions on the world is a crucial ability of intelligent embodied agents, enabling agents to anticipate the effects of their actions and make plans accordingly. While a large body of existing…

Computer Vision and Pattern Recognition · Computer Science 2025-05-12 Siyuan Zhou , Yilun Du , Yuncong Yang , Lei Han , Peihao Chen , Dit-Yan Yeung , Chuang Gan

Visual Hide and Seek

We train embodied agents to play Visual Hide and Seek where a prey must navigate in a simulated environment in order to avoid capture from a predator. We place a variety of obstacles in the environment for the prey to hide behind, and we…

Artificial Intelligence · Computer Science 2019-10-18 Boyuan Chen , Shuran Song , Hod Lipson , Carl Vondrick

Online Grounding of Symbolic Planning Domains in Unknown Environments

If a robotic agent wants to exploit symbolic planning techniques to achieve some goal, it must be able to properly ground an abstract planning domain in the environment in which it operates. However, if the environment is initially unknown…

Artificial Intelligence · Computer Science 2022-04-11 Leonardo Lamanna , Luciano Serafini , Alessandro Saetti , Alfonso Gerevini , Paolo Traverso

Efficient Latent Representations using Multiple Tasks for Autonomous Driving

Driving in the dynamic, multi-agent, and complex urban environment is a difficult task requiring a complex decision policy. The learning of such a policy requires a state representation that can encode the entire environment. Mid-level…

Robotics · Computer Science 2020-03-03 Eshagh Kargar , Ville Kyrki

SoundSpaces: Audio-Visual Navigation in 3D Environments

Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment. We introduce audio-visual navigation for complex, acoustically and…

Computer Vision and Pattern Recognition · Computer Science 2020-08-25 Changan Chen , Unnat Jain , Carl Schissler , Sebastia Vicenc Amengual Gari , Ziad Al-Halah , Vamsi Krishna Ithapu , Philip Robinson , Kristen Grauman