Related papers: Interactive Semantic Map Representation for Skill-…
What is a good visual representation for autonomous agents? We address this question in the context of semantic visual navigation, which is the problem of a robot finding its way through a complex environment to a target object, e.g. go to…
Robots require a semantic understanding of their surroundings to operate in an efficient and explainable way in human environments. In the literature, there has been an extensive focus on object labeling and exhaustive scene graph…
Navigating complex indoor environments requires a deep understanding of the space the robotic agent is acting into to correctly inform the navigation process of the agent towards the goal location. In recent learning-based navigation…
This work focuses on the problem of visual target navigation, which is very important for autonomous robots as it is closely related to high-level tasks. To find a special object in unknown environments, classical and learning-based…
Being able to perceive the semantics and the spatial structure of the environment is essential for visual navigation of a household robot. However, most existing works only employ visual backbones pre-trained either with independent images…
Visual navigation requires the robot to reach a specified goal such as an image, based on a sequence of first-person visual observations. While recent learning-based approaches have made significant progress, they often focus on improving…
To autonomously navigate and plan interactions in real-world environments, robots require the ability to robustly perceive and map complex, unstructured surrounding scenes. Besides building an internal representation of the observed scene…
In visual semantic navigation, the robot navigates to a target object with egocentric visual observations and the class label of the target is given. It is a meaningful task inspiring a surge of relevant research. However, most of the…
Service robots operating in cluttered human environments such as homes, offices, and schools cannot rely on predefined object arrangements and must continuously update their semantic and spatial estimates while dealing with possible…
Navigation has been classically solved in robotics through the combination of SLAM and planning. More recently, beyond waypoint planning, problems involving significant components of (visual) high-level reasoning have been explored in…
We present the Semantic Robot Programming (SRP) paradigm as a convergence of robot programming by demonstration and semantic mapping. In SRP, a user can directly program a robot manipulator by demonstrating a snapshot of their intended goal…
Visual navigation for autonomous agents is a core task in the fields of computer vision and robotics. Learning-based methods, such as deep reinforcement learning, have the potential to outperform the classical solutions developed for this…
Interactive robots navigating photo-realistic environments need to be trained to effectively leverage and handle the dynamic nature of dialogue in addition to the challenges underlying vision-and-language navigation (VLN). In this paper, we…
Visual Semantic Navigation (VSN) is the ability of a robot to learn visual semantic information for navigating in unseen environments. These VSN models are typically tested in those virtual environments where they are trained, mainly using…
Semantic navigation requires an agent to navigate toward a specified target in an unseen environment. Employing an imaginative navigation strategy that predicts future scenes before taking action, can empower the agent to find target…
We propose a novel approach to robot-operated active understanding of unknown indoor scenes, based on online RGBD reconstruction with semantic segmentation. In our method, the exploratory robot scanning is both driven by and targeting at…
Object goal navigation is an important problem in Embodied AI that involves guiding the agent to navigate to an instance of the object category in an unknown environment -- typically an indoor scene. Unfortunately, current state-of-the-art…
This paper presents an adaptive transformer model named SegmATRon for embodied image semantic segmentation. Its distinctive feature is the adaptation of model weights during inference on several images using a hybrid multicomponent loss…
The 3D scene graph models spatial relationships between objects, enabling the agent to efficiently navigate in a partially observable environment and predict the location of the target object.This paper proposes an original framework named…
We present a target-driven navigation system to improve mapless visual navigation in indoor scenes. Our method takes a multi-view observation of a robot and a target as inputs at each time step to provide a sequence of actions that move the…