Related papers: Visual Language Maps for Robot Navigation

Multimodal Spatial Language Maps for Robot Navigation and Manipulation

Grounding language to a navigating agent's observations can leverage pretrained multimodal foundation models to match perceptions to object or event descriptions. However, previous approaches remain disconnected from environment mapping,…

Robotics · Computer Science 2025-06-10 Chenguang Huang , Oier Mees , Andy Zeng , Wolfram Burgard

Audio Visual Language Maps for Robot Navigation

While interacting in the world is a multi-sensory experience, many robots continue to predominantly rely on visual perception to map and navigate in their environments. In this work, we propose Audio-Visual-Language Maps (AVLMaps), a…

Robotics · Computer Science 2023-03-28 Chenguang Huang , Oier Mees , Andy Zeng , Wolfram Burgard

IVLMap: Instance-Aware Visual Language Grounding for Consumer Robot Navigation

Vision-and-Language Navigation (VLN) is a challenging task that requires a robot to navigate in photo-realistic environments with human natural language promptings. Recent studies aim to handle this task by constructing the semantic spatial…

Computer Vision and Pattern Recognition · Computer Science 2024-03-29 Jiacui Huang , Hongtao Zhang , Mingbo Zhao , Zhou Wu

MAG-Nav: Language-Driven Object Navigation Leveraging Memory-Reserved Active Grounding

Visual navigation in unknown environments based solely on natural language descriptions is a key capability for intelligent robots. In this work, we propose a navigation framework built upon off-the-shelf Visual Language Models (VLMs),…

Robotics · Computer Science 2025-08-08 Weifan Zhang , Tingguang Li , Yuzhen Liu

Vision Language Models Can Parse Floor Plan Maps

Vision language models (VLMs) can simultaneously reason about images and texts to tackle many tasks, from visual question answering to image captioning. This paper focuses on map parsing, a novel task that is unexplored within the VLM…

Robotics · Computer Science 2025-11-26 David DeFazio , Hrudayangam Mehta , Meng Wang , Ping Yang , Jeremy Blackburn , Shiqi Zhang

OpenMap: Instruction Grounding via Open-Vocabulary Visual-Language Mapping

Grounding natural language instructions to visual observations is fundamental for embodied agents operating in open-world environments. Recent advances in visual-language mapping have enabled generalizable semantic representations by…

Robotics · Computer Science 2025-08-05 Danyang Li , Zenghui Yang , Guangpeng Qi , Songtao Pang , Guangyong Shang , Qiang Ma , Zheng Yang

Mobile Robot Navigation Using Hand-Drawn Maps: A Vision Language Model Approach

Hand-drawn maps can be used to convey navigation instructions between humans and robots in a natural and efficient manner. However, these maps can often contain inaccuracies such as scale distortions and missing landmarks which present…

Robotics · Computer Science 2025-04-30 Aaron Hao Tan , Angus Fung , Haitong Wang , Goldie Nejat

LiLMaps: Learnable Implicit Language Maps

One of the current trends in robotics is to employ large language models (LLMs) to provide non-predefined command execution and natural human-robot interaction. It is useful to have an environment map together with its language…

Robotics · Computer Science 2025-01-09 Evgenii Kruzhkov , Sven Behnke

Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation Using Vision Language Models

Visual target navigation is a critical capability for autonomous robots operating in unknown environments, particularly in human-robot interaction scenarios. While classical and learning-based methods have shown promise, most existing…

Robotics · Computer Science 2025-05-07 Bangguo Yu , Qihao Yuan , Kailai Li , Hamidreza Kasaei , Ming Cao

Open-vocabulary Queryable Scene Representations for Real World Planning

Large language models (LLMs) have unlocked new capabilities of task planning from human instructions. However, prior attempts to apply LLMs to real-world robotic tasks are limited by the lack of grounding in the surrounding scene. In this…

Robotics · Computer Science 2022-10-18 Boyuan Chen , Fei Xia , Brian Ichter , Kanishka Rao , Keerthana Gopalakrishnan , Michael S. Ryoo , Austin Stone , Daniel Kappler

Navigation with VLM framework: Towards Going to Any Language

Navigating towards fully open language goals and exploring open scenes in an intelligent way have always raised significant challenges. Recently, Vision Language Models (VLMs) have demonstrated remarkable capabilities to reason with both…

Computer Vision and Pattern Recognition · Computer Science 2025-10-29 Zecheng Yin , Chonghao Cheng , and Yao Guo , Zhen Li

VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

Understanding how humans leverage semantic knowledge to navigate unfamiliar environments and decide where to explore next is pivotal for developing robots capable of human-like search behaviors. We introduce a zero-shot navigation approach,…

Robotics · Computer Science 2023-12-07 Naoki Yokoyama , Sehoon Ha , Dhruv Batra , Jiuguang Wang , Bernadette Bucher

DualMap: Online Open-Vocabulary Semantic Mapping for Natural Language Navigation in Dynamic Changing Scenes

We introduce DualMap, an online open-vocabulary mapping system that enables robots to understand and navigate dynamically changing environments through natural language queries. Designed for efficient semantic mapping and adaptability to…

Robotics · Computer Science 2025-12-16 Jiajun Jiang , Yiming Zhu , Zirui Wu , Jie Song

L3MVN: Leveraging Large Language Models for Visual Target Navigation

Visual target navigation in unknown environments is a crucial problem in robotics. Despite extensive investigation of classical and learning-based approaches in the past, robots lack common-sense knowledge about household objects and…

Robotics · Computer Science 2023-12-27 Bangguo Yu , Hamidreza Kasaei , Ming Cao

LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action

Goal-conditioned policies for robotic navigation can be trained on large, unannotated datasets, providing for good generalization to real-world settings. However, particularly in vision-based settings where specifying goals requires an…

Robotics · Computer Science 2022-07-27 Dhruv Shah , Blazej Osinski , Brian Ichter , Sergey Levine

VLPG-Nav: Object Navigation Using Visual Language Pose Graph and Object Localization Probability Maps

We present VLPG-Nav, a visual language navigation method for guiding robots to specified objects within household scenes. Unlike existing methods primarily focused on navigating the robot toward objects, our approach considers the…

Robotics · Computer Science 2024-08-16 Senthil Hariharan Arul , Dhruva Kumar , Vivek Sugirtharaj , Richard Kim , Xuewei , Qi , Rajasimman Madhivanan , Arnie Sen , Dinesh Manocha

ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

Visual navigation is an essential skill for home-assistance robots, providing the object-searching ability to accomplish long-horizon daily tasks. Many recent approaches use Large Language Models (LLMs) for commonsense inference to improve…

Robotics · Computer Science 2024-10-15 Xinxin Zhao , Wenzhe Cai , Likun Tang , Teng Wang

Do Visual-Language Grid Maps Capture Latent Semantics?

Visual-language models (VLMs) have recently been introduced in robotic mapping using the latent representations, i.e., embeddings, of the VLMs to represent semantics in the map. They allow moving from a limited set of human-created labels…

Robotics · Computer Science 2025-09-23 Matti Pekkanen , Tsvetomila Mihaylova , Francesco Verdoja , Ville Kyrki

Relational Scene Graphs for Object Grounding of Natural Language Commands

Robots are finding wider adoption in human environments, increasing the need for natural human-robot interaction. However, understanding a natural language command requires the robot to infer the intended task and how to decompose it into…

Robotics · Computer Science 2026-02-05 Julia Kuhn , Francesco Verdoja , Tsvetomila Mihaylova , Ville Kyrki

LIEREx: Language-Image Embeddings for Robotic Exploration

Semantic maps allow a robot to reason about its surroundings to fulfill tasks such as navigating known environments, finding specific objects, and exploring unmapped areas. Traditional mapping approaches provide accurate geometric…

Robotics · Computer Science 2026-02-03 Felix Igelbrink , Lennart Niecksch , Marian Renz , Martin Günther , Martin Atzmueller