Related papers: Embodied Visual Active Learning for Semantic Segme…

Embodied Learning for Lifelong Visual Perception

We study lifelong visual perception in an embodied setup, where we develop new models and compare various agents that navigate in buildings and occasionally request annotations which, in turn, are used to refine their visual perception…

Computer Vision and Pattern Recognition · Computer Science 2021-12-30 David Nilsson , Aleksis Pirinen , Erik Gärtner , Cristian Sminchisescu

Deep Learning for Embodied Vision Navigation: A Survey

"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation. This problem has attracted rising attention in recent years due to its wide application in autonomous…

Robotics · Computer Science 2021-10-12 Fengda Zhu , Yi Zhu , Vincent CS Lee , Xiaodan Liang , Xiaojun Chang

Embodied Agents for Efficient Exploration and Smart Scene Description

The development of embodied agents that can communicate with humans in natural language has gained increasing interest over the last years, as it facilitates the diffusion of robotic platforms in human-populated environments. As a step…

Robotics · Computer Science 2024-04-16 Roberto Bigazzi , Marcella Cornia , Silvia Cascianelli , Lorenzo Baraldi , Rita Cucchiara

Explore and Explain: Self-supervised Navigation and Recounting

Embodied AI has been recently gaining attention as it aims to foster the development of autonomous and intelligent agents. In this paper, we devise a novel embodied setting in which an agent needs to explore a previously unknown environment…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Roberto Bigazzi , Federico Landi , Marcella Cornia , Silvia Cascianelli , Lorenzo Baraldi , Rita Cucchiara

Learning to Explore Informative Trajectories and Samples for Embodied Perception

We are witnessing significant progress on perception models, specifically those trained on large-scale internet images. However, efficiently generalizing these perception models to unseen embodied tasks is insufficiently studied, which will…

Robotics · Computer Science 2023-03-21 Ya Jing , Tao Kong

Embodied Visual Recognition

Passive visual systems typically fail to recognize objects in the amodal setting where they are heavily occluded. In contrast, humans and other embodied agents have the ability to move in the environment, and actively control the viewing…

Computer Vision and Pattern Recognition · Computer Science 2019-04-10 Jianwei Yang , Zhile Ren , Mingze Xu , Xinlei Chen , David Crandall , Devi Parikh , Dhruv Batra

Towards Embodied Scene Description

Embodiment is an important characteristic for all intelligent agents (creatures and robots), while existing scene description tasks mainly focus on analyzing images passively and the semantic understanding of the scenario is separated from…

Robotics · Computer Science 2020-05-08 Sinan Tan , Huaping Liu , Di Guo , Xinyu Zhang , Fuchun Sun

Embodied Active Domain Adaptation for Semantic Segmentation via Informative Path Planning

This work presents an embodied agent that can adapt its semantic segmentation network to new indoor environments in a fully autonomous way. Because semantic segmentation networks fail to generalize well to unseen environments, the agent…

Robotics · Computer Science 2022-07-05 René Zurbrügg , Hermann Blum , Cesar Cadena , Roland Siegwart , Lukas Schmid

MaAST: Map Attention with Semantic Transformersfor Efficient Visual Navigation

Visual navigation for autonomous agents is a core task in the fields of computer vision and robotics. Learning-based methods, such as deep reinforcement learning, have the potential to outperform the classical solutions developed for this…

Computer Vision and Pattern Recognition · Computer Science 2021-03-23 Zachary Seymour , Kowshik Thopalli , Niluthpol Mithun , Han-Pang Chiu , Supun Samarasekera , Rakesh Kumar

Autonomous Embodied Agents: When Robotics Meets Deep Learning Reasoning

The increase in available computing power and the Deep Learning revolution have allowed the exploration of new topics and frontiers in Artificial Intelligence research. A new field called Embodied Artificial Intelligence, which places at…

Robotics · Computer Science 2025-05-05 Roberto Bigazzi

Multi-Agent Embodied Visual Semantic Navigation with Scene Prior Knowledge

In visual semantic navigation, the robot navigates to a target object with egocentric visual observations and the class label of the target is given. It is a meaningful task inspiring a surge of relevant research. However, most of the…

Artificial Intelligence · Computer Science 2021-09-21 Xinzhu Liu , Di Guo , Huaping Liu , Fuchun Sun

Visual Representations for Semantic Target Driven Navigation

What is a good visual representation for autonomous agents? We address this question in the context of semantic visual navigation, which is the problem of a robot finding its way through a complex environment to a target object, e.g. go to…

Computer Vision and Pattern Recognition · Computer Science 2019-07-04 Arsalan Mousavian , Alexander Toshev , Marek Fiser , Jana Kosecka , Ayzaan Wahid , James Davidson

Environment Predictive Coding for Embodied Agents

We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents. In contrast to prior work on self-supervised learning for images, we aim to jointly encode a series of…

Computer Vision and Pattern Recognition · Computer Science 2021-02-05 Santhosh K. Ramakrishnan , Tushar Nagarajan , Ziad Al-Halah , Kristen Grauman

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation

Embodied scene understanding requires not only comprehending visual-spatial information that has been observed but also determining where to explore next in the 3D physical world. Existing 3D Vision-Language (3D-VL) models primarily focus…

Computer Vision and Pattern Recognition · Computer Science 2025-07-31 Ziyu Zhu , Xilin Wang , Yixuan Li , Zhuofan Zhang , Xiaojian Ma , Yixin Chen , Baoxiong Jia , Wei Liang , Qian Yu , Zhidong Deng , Siyuan Huang , Qing Li

An Exploration of Embodied Visual Exploration

Embodied computer vision considers perception for robots in novel, unstructured environments. Of particular importance is the embodied visual exploration problem: how might a robot equipped with a camera scope out a new environment? Despite…

Computer Vision and Pattern Recognition · Computer Science 2020-08-24 Santhosh K. Ramakrishnan , Dinesh Jayaraman , Kristen Grauman

Semantic Segmentation with Active Semi-Supervised Learning

Using deep learning, we now have the ability to create exceptionally good semantic segmentation systems; however, collecting the prerequisite pixel-wise annotations for training images remains expensive and time-consuming. Therefore, it…

Computer Vision and Pattern Recognition · Computer Science 2022-10-19 Aneesh Rangnekar , Christopher Kanan , Matthew Hoffman

SoundSpaces: Audio-Visual Navigation in 3D Environments

Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment. We introduce audio-visual navigation for complex, acoustically and…

Computer Vision and Pattern Recognition · Computer Science 2020-08-25 Changan Chen , Unnat Jain , Carl Schissler , Sebastia Vicenc Amengual Gari , Ziad Al-Halah , Vamsi Krishna Ithapu , Philip Robinson , Kristen Grauman

Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

A crucial ability of mobile intelligent agents is to integrate the evidence from multiple sensory inputs in an environment and to make a sequence of actions to reach their goals. In this paper, we attempt to approach the problem of…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Chuang Gan , Yiwei Zhang , Jiajun Wu , Boqing Gong , Joshua B. Tenenbaum

Active Self-Training for Weakly Supervised 3D Scene Semantic Segmentation

Since the preparation of labeled data for training semantic segmentation networks of point clouds is a time-consuming process, weakly supervised approaches have been introduced to learn from only a small fraction of data. These methods are…

Computer Vision and Pattern Recognition · Computer Science 2022-09-16 Gengxin Liu , Oliver van Kaick , Hui Huang , Ruizhen Hu

Teaching Agents how to Map: Spatial Reasoning for Multi-Object Navigation

In the context of visual navigation, the capacity to map a novel environment is necessary for an agent to exploit its observation history in the considered place and efficiently reach known goals. This ability can be associated with spatial…

Computer Vision and Pattern Recognition · Computer Science 2023-04-26 Pierre Marza , Laetitia Matignon , Olivier Simonin , Christian Wolf