English
Related papers

Related papers: Object-and-Action Aware Model for Visual Language …

200 papers

Natural language instructions for visual navigation often use scene descriptions (e.g., "bedroom") and object references (e.g., "green chairs") to provide a breadcrumb trail to a goal location. This work presents a transformer-based…

Computer Vision and Pattern Recognition · Computer Science 2021-10-28 Abhinav Moudgil , Arjun Majumdar , Harsh Agrawal , Stefan Lee , Dhruv Batra

Vision-and-Language Navigation (VLN) refers to the task of enabling autonomous robots to navigate unfamiliar environments by following natural language instructions. While recent Large Vision-Language Models (LVLMs) have shown promise in…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Vebjørn Haug Kåsene , Pierre Lison

Vision-and-Language Navigation (VLN) is a challenging task that requires a robot to navigate in photo-realistic environments with human natural language promptings. Recent studies aim to handle this task by constructing the semantic spatial…

Computer Vision and Pattern Recognition · Computer Science 2024-03-29 Jiacui Huang , Hongtao Zhang , Mingbo Zhao , Zhou Wu

Vision-and-Language Navigation (VLN) is a natural language grounding task where agents have to interpret natural language instructions in the context of visual scenes in a dynamic environment to achieve prescribed navigation goals.…

Computation and Language · Computer Science 2019-06-03 Haoshuo Huang , Vihan Jain , Harsh Mehta , Jason Baldridge , Eugene Ie

Vision-and-Language Navigation (VLN) requires an agent to ground language instructions to its own movement within a visual environment. While state-of-the-art methods leverage the reasoning capabilities of Vision-Language Models (VLMs) for…

Vision-Language Navigation (VLN) is a challenging task which requires an agent to align complex visual observations to language instructions to reach the goal position. Most existing VLN agents directly learn to align the raw directional…

Computer Vision and Pattern Recognition · Computer Science 2024-03-15 Bingqian Lin , Yi Zhu , Xiaodan Liang , Liang Lin , Jianzhuang Liu

Vision-and-Language Navigation (VLN) aims to navigate to the target location by following a given instruction. Unlike existing methods focused on predicting a more accurate action at each step in navigation, in this paper, we make the first…

Computer Vision and Pattern Recognition · Computer Science 2023-08-08 Chongyang Zhao , Yuankai Qi , Qi Wu

The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments according to the given language instruction. The main challenges of VLN arise…

Computer Vision and Pattern Recognition · Computer Science 2020-11-24 Weixia Zhang , Chao Ma , Qi Wu , Xiaokang Yang

Vision-and-Language Navigation (VLN) requires an agent to follow natural-language instructions, explore the given environments, and reach the desired target locations. These step-by-step navigational instructions are crucial when the agent…

Computation and Language · Computer Science 2020-05-08 Yubo Zhang , Hao Tan , Mohit Bansal

In the Vision-and-Language Navigation (VLN) task an embodied agent navigates a 3D environment, following natural language instructions. A challenge in this task is how to handle 'off the path' scenarios where an agent veers from a reference…

Computer Vision and Pattern Recognition · Computer Science 2021-10-01 Sonia Raychaudhuri , Saim Wani , Shivansh Patel , Unnat Jain , Angel X. Chang

Vision-Language Navigation (VLN) is a task where agents learn to navigate following natural language instructions. The key to this task is to perceive both the visual scene and natural language sequentially. Conventional approaches exploit…

Computer Vision and Pattern Recognition · Computer Science 2020-04-02 Fengda Zhu , Yi Zhu , Xiaojun Chang , Xiaodan Liang

Visual navigation in unknown environments based solely on natural language descriptions is a key capability for intelligent robots. In this work, we propose a navigation framework built upon off-the-shelf Visual Language Models (VLMs),…

Robotics · Computer Science 2025-08-08 Weifan Zhang , Tingguang Li , Yuzhen Liu

Developing agents capable of navigating to a target location based on language instructions and visual information, known as vision-language navigation (VLN), has attracted widespread interest. Most research has focused on ground-based…

Computer Vision and Pattern Recognition · Computer Science 2024-10-11 Xiangyu Wang , Donglin Yang , Ziqin Wang , Hohin Kwan , Jinyu Chen , Wenjun Wu , Hongsheng Li , Yue Liao , Si Liu

Vision-Language Navigation (VLN) tasks require an agent to follow human language instructions to navigate in previously unseen environments. This challenging field involving problems in natural language processing, computer vision,…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Wansen Wu , Tao Chang , Xinmeng Li

Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments. One of the key challenges in VLN is how to conduct a robust navigation by mitigating the…

Computer Vision and Pattern Recognition · Computer Science 2020-08-21 Hanqing Wang , Wenguan Wang , Tianmin Shu , Wei Liang , Jianbing Shen

We study the task of zero-shot vision-and-language navigation (ZS-VLN), a practical yet challenging problem in which an agent learns to navigate following a path described by language instructions without requiring any path-instruction…

Computer Vision and Pattern Recognition · Computer Science 2023-08-17 Peihao Chen , Xinyu Sun , Hongyan Zhi , Runhao Zeng , Thomas H. Li , Gaowen Liu , Mingkui Tan , Chuang Gan

Incremental decision making in real-world environments is one of the most challenging tasks in embodied artificial intelligence. One particularly demanding scenario is Vision and Language Navigation~(VLN) which requires visual and natural…

Artificial Intelligence · Computer Science 2024-01-25 Raphael Schumann , Wanrong Zhu , Weixi Feng , Tsu-Jui Fu , Stefan Riezler , William Yang Wang

Aerial Vision-and-Language Navigation (VLN) aims to enable unmanned aerial vehicles (UAVs) to interpret natural language instructions and navigate complex urban environments using onboard visual observation. This task holds promise for…

Computer Vision and Pattern Recognition · Computer Science 2026-04-16 Huilin Xu , Zhuoyang Liu , Yixiang Luomei , Feng Xu

Vision-and-Language Navigation (VLN) aims to enable an embodied agent to follow natural-language instructions and navigate to a target location in unseen 3D environments. We argue that adapting VLMs to VLN requires endowing them with two…

Computer Vision and Pattern Recognition · Computer Science 2026-05-01 Pengna Li , Kangyi Wu , Shaoqing Xu , Fang Li , Hanbing Li , Lin Zhao , Kailin Lyu , Long Chen , Zhi-Xin Yang , Nanning Zheng

Vision-language navigation (VLN) requires intelligent agents to navigate environments by interpreting linguistic instructions alongside visual observations, serving as a cornerstone task in Embodied AI. Current VLN research for unmanned…

‹ Prev 1 2 3 10 Next ›