Related papers: Embodied Visual Recognition

Embodied Visual Active Learning for Semantic Segmentation

We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding by actively selecting views for which to request annotation. While accurate on some…

Computer Vision and Pattern Recognition · Computer Science 2020-12-18 David Nilsson , Aleksis Pirinen , Erik Gärtner , Cristian Sminchisescu

Embodied Learning for Lifelong Visual Perception

We study lifelong visual perception in an embodied setup, where we develop new models and compare various agents that navigate in buildings and occasionally request annotations which, in turn, are used to refine their visual perception…

Computer Vision and Pattern Recognition · Computer Science 2021-12-30 David Nilsson , Aleksis Pirinen , Erik Gärtner , Cristian Sminchisescu

Deep Learning for Embodied Vision Navigation: A Survey

"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation. This problem has attracted rising attention in recent years due to its wide application in autonomous…

Robotics · Computer Science 2021-10-12 Fengda Zhu , Yi Zhu , Vincent CS Lee , Xiaodan Liang , Xiaojun Chang

Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL

Embodied visual tracking is to follow a target object in dynamic 3D environments using an agent's egocentric vision. This is a vital and challenging skill for embodied agents. However, existing methods suffer from inefficient training and…

Computer Vision and Pattern Recognition · Computer Science 2024-07-23 Fangwei Zhong , Kui Wu , Hai Ci , Churan Wang , Hao Chen

Move to See Better: Self-Improving Embodied Object Detection

Passive methods for object detection and segmentation treat images of the same scene as individual samples and do not exploit object permanence across multiple views. Generalization to novel or difficult viewpoints thus requires additional…

Computer Vision and Pattern Recognition · Computer Science 2021-03-30 Zhaoyuan Fang , Ayush Jain , Gabriel Sarch , Adam W. Harley , Katerina Fragkiadaki

Embodied vision for learning object representations

Recent time-contrastive learning approaches manage to learn invariant object representations without supervision. This is achieved by mapping successive views of an object onto close-by internal representations. When considering this…

Machine Learning · Computer Science 2022-05-13 Arthur Aubret , Céline Teulière , Jochen Triesch

Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters

In Vision-and-Language Navigation (VLN), an embodied agent needs to reach a target destination with the only guidance of a natural language instruction. To explore the environment and progress towards the target location, the agent must…

Computer Vision and Pattern Recognition · Computer Science 2019-09-26 Federico Landi , Lorenzo Baraldi , Massimiliano Corsini , Rita Cucchiara

Environment Predictive Coding for Embodied Agents

We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents. In contrast to prior work on self-supervised learning for images, we aim to jointly encode a series of…

Computer Vision and Pattern Recognition · Computer Science 2021-02-05 Santhosh K. Ramakrishnan , Tushar Nagarajan , Ziad Al-Halah , Kristen Grauman

Towards Embodied Scene Description

Embodiment is an important characteristic for all intelligent agents (creatures and robots), while existing scene description tasks mainly focus on analyzing images passively and the semantic understanding of the scenario is separated from…

Robotics · Computer Science 2020-05-08 Sinan Tan , Huaping Liu , Di Guo , Xinyu Zhang , Fuchun Sun

Embodied Active Learning of Generative Sensor-Object Models

When a robot encounters a novel object, how should it respond$\unicode{x2014}$what data should it collect$\unicode{x2014}$so that it can find the object in the future? In this work, we present a method for learning image features of an…

Robotics · Computer Science 2024-10-16 Allison Pinosky , Todd D. Murphey

Object Manipulation via Visual Target Localization

Object manipulation is a critical skill required for Embodied AI agents interacting with the world around them. Training agents to manipulate objects, poses many challenges. These include occlusion of the target object by the agent's arm,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-16 Kiana Ehsani , Ali Farhadi , Aniruddha Kembhavi , Roozbeh Mottaghi

An Exploration of Embodied Visual Exploration

Embodied computer vision considers perception for robots in novel, unstructured environments. Of particular importance is the embodied visual exploration problem: how might a robot equipped with a camera scope out a new environment? Despite…

Computer Vision and Pattern Recognition · Computer Science 2020-08-24 Santhosh K. Ramakrishnan , Dinesh Jayaraman , Kristen Grauman

Embodied Agents for Efficient Exploration and Smart Scene Description

The development of embodied agents that can communicate with humans in natural language has gained increasing interest over the last years, as it facilitates the diffusion of robotic platforms in human-populated environments. As a step…

Robotics · Computer Science 2024-04-16 Roberto Bigazzi , Marcella Cornia , Silvia Cascianelli , Lorenzo Baraldi , Rita Cucchiara

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

A crucial ability of mobile intelligent agents is to integrate the evidence from multiple sensory inputs in an environment and to make a sequence of actions to reach their goals. In this paper, we attempt to approach the problem of…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Chuang Gan , Yiwei Zhang , Jiajun Wu , Boqing Gong , Joshua B. Tenenbaum

Embodied Understanding of Driving Scenarios

Embodied scene understanding serves as the cornerstone for autonomous agents to perceive, interpret, and respond to open driving scenarios. Such understanding is typically founded upon Vision-Language Models (VLMs). Nevertheless, existing…

Computer Vision and Pattern Recognition · Computer Science 2024-03-08 Yunsong Zhou , Linyan Huang , Qingwen Bu , Jia Zeng , Tianyu Li , Hang Qiu , Hongzi Zhu , Minyi Guo , Yu Qiao , Hongyang Li

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation

Embodied scene understanding requires not only comprehending visual-spatial information that has been observed but also determining where to explore next in the 3D physical world. Existing 3D Vision-Language (3D-VL) models primarily focus…

Computer Vision and Pattern Recognition · Computer Science 2025-07-31 Ziyu Zhu , Xilin Wang , Yixuan Li , Zhuofan Zhang , Xiaojian Ma , Yixin Chen , Baoxiong Jia , Wei Liang , Qian Yu , Zhidong Deng , Siyuan Huang , Qing Li

Embodied AI Agents: Modeling the World

This paper describes our research on AI agents embodied in visual, virtual or physical forms, enabling them to interact with both users and their environments. These agents, which include virtual avatars, wearable devices, and robots, are…

Artificial Intelligence · Computer Science 2025-07-08 Pascale Fung , Yoram Bachrach , Asli Celikyilmaz , Kamalika Chaudhuri , Delong Chen , Willy Chung , Emmanuel Dupoux , Hongyu Gong , Hervé Jégou , Alessandro Lazaric , Arjun Majumdar , Andrea Madotto , Franziska Meier , Florian Metze , Louis-Philippe Morency , Théo Moutakanni , Juan Pino , Basile Terver , Joseph Tighe , Paden Tomasello , Jitendra Malik

AVLEN: Audio-Visual-Language Embodied Navigation in 3D Environments

Recent years have seen embodied visual navigation advance in two distinct directions: (i) in equipping the AI agent to follow natural language instructions, and (ii) in making the navigable world multimodal, e.g., audio-visual navigation.…

Computer Vision and Pattern Recognition · Computer Science 2022-10-17 Sudipta Paul , Amit K. Roy-Chowdhury , Anoop Cherian

Offline Visual Representation Learning for Embodied Navigation

How should we learn visual representations for embodied agents that must see and move? The status quo is tabula rasa in vivo, i.e. learning visual representations from scratch while also learning to move, potentially augmented with…

Computer Vision and Pattern Recognition · Computer Science 2022-04-29 Karmesh Yadav , Ram Ramrakhya , Arjun Majumdar , Vincent-Pierre Berges , Sachit Kuhar , Dhruv Batra , Alexei Baevski , Oleksandr Maksymets

Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding

This paper investigates the problem of understanding dynamic 3D scenes from egocentric observations, a key challenge in robotics and embodied AI. Unlike prior studies that explored this as long-form video understanding and utilized…

Computer Vision and Pattern Recognition · Computer Science 2025-01-10 Yue Fan , Xiaojian Ma , Rongpeng Su , Jun Guo , Rujie Wu , Xi Chen , Qing Li