English
Related papers

Related papers: Instance-Level Semantic Maps for Vision Language N…

200 papers

Vision-and-Language Navigation (VLN) is a challenging task that requires a robot to navigate in photo-realistic environments with human natural language promptings. Recent studies aim to handle this task by constructing the semantic spatial…

Computer Vision and Pattern Recognition · Computer Science 2024-03-29 Jiacui Huang , Hongtao Zhang , Mingbo Zhao , Zhou Wu

Humans excel at forming mental maps of their surroundings, equipping them to understand object relationships and navigate based on language queries. Our previous work, SI Maps (Nanwani L, Agarwal A, Jain K, et al. Instance-level semantic…

Computer Vision and Pattern Recognition · Computer Science 2025-10-28 Laksh Nanwani , Kumaraditya Gupta , Aditya Mathur , Swayam Agrawal , A. H. Abdul Hafez , K. Madhava Krishna

We consider the problem of Vision-and-Language Navigation (VLN). The majority of current methods for VLN are trained end-to-end using either unstructured memory such as LSTM, or using cross-modal attention over the egocentric observations…

Computer Vision and Pattern Recognition · Computer Science 2022-03-22 Georgios Georgakis , Karl Schmeckpeper , Karan Wanchoo , Soham Dan , Eleni Miltsakaki , Dan Roth , Kostas Daniilidis

Robots require a semantic understanding of their surroundings to operate in an efficient and explainable way in human environments. In the literature, there has been an extensive focus on object labeling and exhaustive scene graph…

Robotics · Computer Science 2024-04-16 Roberto Bigazzi , Lorenzo Baraldi , Shreyas Kousik , Rita Cucchiara , Marco Pavone

Vision-and-Language Navigation (VLN) requires an agent to navigate in a real-world environment following natural language instructions. From both the textual and visual perspectives, we find that the relationships among the scene, its…

Computer Vision and Pattern Recognition · Computer Science 2020-12-29 Yicong Hong , Cristian Rodriguez-Opazo , Yuankai Qi , Qi Wu , Stephen Gould

One of the current trends in robotics is to employ large language models (LLMs) to provide non-predefined command execution and natural human-robot interaction. It is useful to have an environment map together with its language…

Robotics · Computer Science 2025-01-09 Evgenii Kruzhkov , Sven Behnke

To autonomously navigate and plan interactions in real-world environments, robots require the ability to robustly perceive and map complex, unstructured surrounding scenes. Besides building an internal representation of the observed scene…

Visual navigation is an essential skill for home-assistance robots, providing the object-searching ability to accomplish long-horizon daily tasks. Many recent approaches use Large Language Models (LLMs) for commonsense inference to improve…

Robotics · Computer Science 2024-10-15 Xinxin Zhao , Wenzhe Cai , Likun Tang , Teng Wang

Visual navigation is a fundamental capability for autonomous home-assistance robots, enabling long-horizon tasks such as object search. While recent methods have leveraged Large Language Models (LLMs) to incorporate commonsense reasoning…

Robotics · Computer Science 2026-05-01 Teng Wang , Xinxin Zhao , Wenzhe Cai , Changyin Sun

In the Vision-and-Language Navigation (VLN) task, the agent is required to navigate to a destination following a natural language instruction. While learning-based approaches have been a major solution to the task, they suffer from high…

Artificial Intelligence · Computer Science 2024-08-13 Zhaohuan Zhan , Lisha Yu , Sijie Yu , Guang Tan

Navigational signs are common aids for human wayfinding and scene understanding, but are underutilized by robots. We argue that they benefit robot navigation and scene understanding, by directly encoding privileged information on actions,…

Robotics · Computer Science 2025-09-17 Ayush Agrawal , Joel Loo , Nicky Zimmerman , David Hsu

Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. To represent the previously visited environment, most approaches for VLN implement memory…

Computer Vision and Pattern Recognition · Computer Science 2023-08-25 Zihan Wang , Xiangyang Li , Jiahao Yang , Yeqi Liu , Shuqiang Jiang

We explore the use of language as a perceptual representation for vision-and-language navigation (VLN), with a focus on low-data settings. Our approach uses off-the-shelf vision systems for image captioning and object detection to convert…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Bowen Pan , Rameswar Panda , SouYoung Jin , Rogerio Feris , Aude Oliva , Phillip Isola , Yoon Kim

Vision-and-Language Navigation (VLN) is the task that requires an agent to navigate through the environment based on natural language instructions. At each step, the agent takes the next action by selecting from a set of navigable…

Computer Vision and Pattern Recognition · Computer Science 2023-04-12 Jialu Li , Mohit Bansal

Large-scale pre-training has shown promising results on the vision-and-language navigation (VLN) task. However, most existing pre-training methods employ discrete panoramas to learn visual-textual associations. This requires the model to…

Computer Vision and Pattern Recognition · Computer Science 2023-08-04 Dong An , Yuankai Qi , Yangguang Li , Yan Huang , Liang Wang , Tieniu Tan , Jing Shao

We present Vision-based Navigation with Language-based Assistance (VNLA), a grounded vision-language task where an agent with visual perception is guided via language to find objects in photorealistic indoor environments. The task emulates…

Machine Learning · Computer Science 2019-04-09 Khanh Nguyen , Debadeepta Dey , Chris Brockett , Bill Dolan

This paper presents a novel approach for the Vision-and-Language Navigation (VLN) task in continuous 3D environments, which requires an autonomous agent to follow natural language instructions in unseen environments. Existing end-to-end…

Vision-and-language navigation (VLN) is a multimodal task where an agent follows natural language instructions and navigates in visual environments. Multiple setups have been proposed, and researchers apply new model architectures or…

Computer Vision and Pattern Recognition · Computer Science 2022-05-05 Wanrong Zhu , Yuankai Qi , Pradyumna Narayana , Kazoo Sone , Sugato Basu , Xin Eric Wang , Qi Wu , Miguel Eckstein , William Yang Wang

Incremental decision making in real-world environments is one of the most challenging tasks in embodied artificial intelligence. One particularly demanding scenario is Vision and Language Navigation~(VLN) which requires visual and natural…

Artificial Intelligence · Computer Science 2024-01-25 Raphael Schumann , Wanrong Zhu , Weixi Feng , Tsu-Jui Fu , Stefan Riezler , William Yang Wang

Autonomous navigation in unfamiliar environments often relies on geometric mapping and planning strategies that overlook rich semantic cues such as signs, room numbers, and textual labels. We propose a novel semantic navigation framework…

Robotics · Computer Science 2026-01-13 Jing Cao , Nishanth Kumar , Aidan Curtis
‹ Prev 1 2 3 10 Next ›