English
Related papers

Related papers: Multi-View Learning for Vision-and-Language Naviga…

200 papers

Learning to navigate in a visual environment following natural-language instructions is a challenging task, because the multimodal inputs to the agent are highly variable, and the training data on a new task is often limited. In this paper,…

Computer Vision and Pattern Recognition · Computer Science 2020-04-07 Weituo Hao , Chunyuan Li , Xiujun Li , Lawrence Carin , Jianfeng Gao

Commanding a robot to navigate with natural language instructions is a long-term goal for grounded language understanding and robotics. But the dominant language is English, according to previous studies on vision-language navigation (VLN).…

Computation and Language · Computer Science 2020-12-08 An Yan , Xin Eric Wang , Jiangtao Feng , Lei Li , William Yang Wang

Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions. Despite significant advances, few previous works are able to fully utilize the strong correspondence between…

Computer Vision and Pattern Recognition · Computer Science 2020-10-06 Yicong Hong , Cristian Rodriguez-Opazo , Qi Wu , Stephen Gould

Vision-and-Language Navigation (VLN) is a task where an agent navigates in an embodied indoor environment under human instructions. Previous works ignore the distribution of sample difficulty and we argue that this potentially degrade their…

Machine Learning · Computer Science 2021-11-16 Jiwen Zhang , Zhongyu Wei , Jianqing Fan , Jiajie Peng

In the Vision-and-Language Navigation (VLN) task an embodied agent navigates a 3D environment, following natural language instructions. A challenge in this task is how to handle 'off the path' scenarios where an agent veers from a reference…

Computer Vision and Pattern Recognition · Computer Science 2021-10-01 Sonia Raychaudhuri , Saim Wani , Shivansh Patel , Unnat Jain , Angel X. Chang

Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the environment based on language instructions. In this paper, we aim to solve two key challenges in this task: utilizing multilingual instructions for improved…

Computer Vision and Pattern Recognition · Computer Science 2022-07-06 Jialu Li , Hao Tan , Mohit Bansal

Recent advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs) have made them powerful tools in embodied navigation, enabling agents to leverage commonsense and spatial reasoning for efficient exploration in…

Core to the vision-and-language navigation (VLN) challenge is building robust instruction representations and action decoding schemes, which can generalize well to previously unseen instructions and environments. In this paper, we report…

Computation and Language · Computer Science 2019-09-06 Xiujun Li , Chunyuan Li , Qiaolin Xia , Yonatan Bisk , Asli Celikyilmaz , Jianfeng Gao , Noah Smith , Yejin Choi

The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments according to the given language instruction. The main challenges of VLN arise…

Computer Vision and Pattern Recognition · Computer Science 2020-11-24 Weixia Zhang , Chao Ma , Qi Wu , Xiaokang Yang

A goal-conditioned reinforcement learning agent exploring an environment will see a wealth of information throughout a trajectory, most of which is discarded when only performing on-policy updates with respect to the commanded goal.…

Vision-and-Language Navigation (VLN) is unique in that it requires turning relatively general natural-language instructions into robot agent actions, on the basis of the visible environment. This requires to extract value from two very…

Computation and Language · Computer Science 2020-07-30 Yuankai Qi , Zizheng Pan , Shengping Zhang , Anton van den Hengel , Qi Wu

Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments. One of the key challenges in VLN is how to conduct a robust navigation by mitigating the…

Computer Vision and Pattern Recognition · Computer Science 2020-08-21 Hanqing Wang , Wenguan Wang , Tianmin Shu , Wei Liang , Jianbing Shen

Understanding spatial and visual information is essential for a navigation agent who follows natural language instructions. The current Transformer-based VLN agents entangle the orientation and vision information, which limits the gain from…

Computer Vision and Pattern Recognition · Computer Science 2022-09-27 Yue Zhang , Parisa Kordjamshidi

Vision-and-Language Navigation (VLN) requires agents to navigate photo-realistic environments following natural language instructions. Current methods predominantly rely on imitation learning, which suffers from limited generalization and…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Jiangyang Li , Cong Wan , SongLin Dong , Chenhao Ding , Qiang Wang , Zhiheng Ma , Yihong Gong

Vision-and-Language Navigation (VLN) is a natural language grounding task where agents have to interpret natural language instructions in the context of visual scenes in a dynamic environment to achieve prescribed navigation goals.…

Computation and Language · Computer Science 2019-06-03 Haoshuo Huang , Vihan Jain , Harsh Mehta , Jason Baldridge , Eugene Ie

Humans can collaborate and complete tasks based on visual signals and instruction from the environment. Training such a robot is difficult especially due to the understanding of the instruction and the complicated environment. Previous…

Artificial Intelligence · Computer Science 2023-05-12 Kairui Zhou

Advances in learning and representations have reinvigorated work that connects language to other modalities. A particularly exciting direction is Vision-and-Language Navigation(VLN), in which agents interpret natural language instructions…

Artificial Intelligence · Computer Science 2019-06-24 Vihan Jain , Gabriel Magalhaes , Alexander Ku , Ashish Vaswani , Eugene Ie , Jason Baldridge

Vision-Language Navigation requires the agent to follow natural language instructions to reach a specific target. The large discrepancy between seen and unseen environments makes it challenging for the agent to generalize well. Previous…

Computer Vision and Pattern Recognition · Computer Science 2022-09-13 Yujie Lu , Huiliang Zhang , Ping Nie , Weixi Feng , Wenda Xu , Xin Eric Wang , William Yang Wang

Natural language instructions for visual navigation often use scene descriptions (e.g., "bedroom") and object references (e.g., "green chairs") to provide a breadcrumb trail to a goal location. This work presents a transformer-based…

Computer Vision and Pattern Recognition · Computer Science 2021-10-28 Abhinav Moudgil , Arjun Majumdar , Harsh Agrawal , Stefan Lee , Dhruv Batra

Goal-oriented vision-language navigation requires robust exploration capabilities for agents to navigate to specified goals in unknown environments without step-by-step instructions. Existing methods tend to exclusively utilize…

Computer Vision and Pattern Recognition · Computer Science 2026-03-19 Songze Li , Zun Wang , Gengze Zhou , Jialu Li , Xiangyu Zeng , Ziyang Gong , Limin Wang , Yu Qiao , Qi Wu , Mohit Bansal , Yi Wang
‹ Prev 1 2 3 10 Next ›