Related papers: Multi-View Learning for Vision-and-Language Naviga…

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training

Learning to navigate in a visual environment following natural-language instructions is a challenging task, because the multimodal inputs to the agent are highly variable, and the training data on a new task is often limited. In this paper,…

Computer Vision and Pattern Recognition · Computer Science 2020-04-07 Weituo Hao , Chunyuan Li , Xiujun Li , Lawrence Carin , Jianfeng Gao

Cross-Lingual Vision-Language Navigation

Commanding a robot to navigate with natural language instructions is a long-term goal for grounded language understanding and robotics. But the dominant language is English, according to previous studies on vision-language navigation (VLN).…

Computation and Language · Computer Science 2020-12-08 An Yan , Xin Eric Wang , Jiangtao Feng , Lei Li , William Yang Wang

Sub-Instruction Aware Vision-and-Language Navigation

Vision-and-language navigation requires an agent to navigate through a real 3D environment following natural language instructions. Despite significant advances, few previous works are able to fully utilize the strong correspondence between…

Computer Vision and Pattern Recognition · Computer Science 2020-10-06 Yicong Hong , Cristian Rodriguez-Opazo , Qi Wu , Stephen Gould

Curriculum Learning for Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a task where an agent navigates in an embodied indoor environment under human instructions. Previous works ignore the distribution of sample difficulty and we argue that this potentially degrade their…

Machine Learning · Computer Science 2021-11-16 Jiwen Zhang , Zhongyu Wei , Jianqing Fan , Jiajie Peng

Language-Aligned Waypoint (LAW) Supervision for Vision-and-Language Navigation in Continuous Environments

In the Vision-and-Language Navigation (VLN) task an embodied agent navigates a 3D environment, following natural language instructions. A challenge in this task is how to handle 'off the path' scenarios where an agent veers from a reference…

Computer Vision and Pattern Recognition · Computer Science 2021-10-01 Sonia Raychaudhuri , Saim Wani , Shivansh Patel , Unnat Jain , Angel X. Chang

CLEAR: Improving Vision-Language Navigation with Cross-Lingual, Environment-Agnostic Representations

Vision-and-Language Navigation (VLN) tasks require an agent to navigate through the environment based on language instructions. In this paper, we aim to solve two key challenges in this task: utilizing multilingual instructions for improved…

Computer Vision and Pattern Recognition · Computer Science 2022-07-06 Jialu Li , Hao Tan , Mohit Bansal

Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation

Recent advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs) have made them powerful tools in embodied navigation, enabling agents to leverage commonsense and spatial reasoning for efficient exploration in…

Robotics · Computer Science 2025-06-12 Lingfeng Zhang , Yuecheng Liu , Zhanguang Zhang , Matin Aghaei , Yaochen Hu , Hongjian Gu , Mohammad Ali Alomrani , David Gamaliel Arcos Bravo , Raika Karimi , Atia Hamidizadeh , Haoping Xu , Guowei Huang , Zhanpeng Zhang , Tongtong Cao , Weichao Qiu , Xingyue Quan , Jianye Hao , Yuzheng Zhuang , Yingxue Zhang

Robust Navigation with Language Pretraining and Stochastic Sampling

Core to the vision-and-language navigation (VLN) challenge is building robust instruction representations and action decoding schemes, which can generalize well to previously unseen instructions and environments. In this paper, we report…

Computation and Language · Computer Science 2019-09-06 Xiujun Li , Chunyuan Li , Qiaolin Xia , Yonatan Bisk , Asli Celikyilmaz , Jianfeng Gao , Noah Smith , Yejin Choi

Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning

The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments according to the given language instruction. The main challenges of VLN arise…

Computer Vision and Pattern Recognition · Computer Science 2020-11-24 Weixia Zhang , Chao Ma , Qi Wu , Xiaokang Yang

Goal-Conditioned Agents that Learn Everything All at Once

A goal-conditioned reinforcement learning agent exploring an environment will see a wealth of information throughout a trajectory, most of which is discarded when only performing on-policy updates with respect to the commanded goal.…

Machine Learning · Computer Science 2026-05-25 Michael Matthews , Matthew Jackson , Michael Beukman , Thomas Foster , Alistair Letcher , Scott Fujimoto , Cédric Colas , Jakob Foerster

Object-and-Action Aware Model for Visual Language Navigation

Vision-and-Language Navigation (VLN) is unique in that it requires turning relatively general natural-language instructions into robot agent actions, on the basis of the visible environment. This requires to extract value from two very…

Computation and Language · Computer Science 2020-07-30 Yuankai Qi , Zizheng Pan , Shengping Zhang , Anton van den Hengel , Qi Wu

Active Visual Information Gathering for Vision-Language Navigation

Vision-language navigation (VLN) is the task of entailing an agent to carry out navigational instructions inside photo-realistic environments. One of the key challenges in VLN is how to conduct a robust navigation by mitigating the…

Computer Vision and Pattern Recognition · Computer Science 2020-08-21 Hanqing Wang , Wenguan Wang , Tianmin Shu , Wei Liang , Jianbing Shen

LOViS: Learning Orientation and Visual Signals for Vision and Language Navigation

Understanding spatial and visual information is essential for a navigation agent who follows natural language instructions. The current Transformer-based VLN agents entangle the orientation and vision information, which limits the gain from…

Computer Vision and Pattern Recognition · Computer Science 2022-09-27 Yue Zhang , Parisa Kordjamshidi

Trajectory-Diversity-Driven Robust Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) requires agents to navigate photo-realistic environments following natural language instructions. Current methods predominantly rely on imitation learning, which suffers from limited generalization and…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Jiangyang Li , Cong Wan , SongLin Dong , Chenhao Ding , Qiang Wang , Zhiheng Ma , Yihong Gong

Multi-modal Discriminative Model for Vision-and-Language Navigation

Vision-and-Language Navigation (VLN) is a natural language grounding task where agents have to interpret natural language instructions in the context of visual scenes in a dynamic environment to achieve prescribed navigation goals.…

Computation and Language · Computer Science 2019-06-03 Haoshuo Huang , Vihan Jain , Harsh Mehta , Jason Baldridge , Eugene Ie

Accessible Instruction-Following Agent

Humans can collaborate and complete tasks based on visual signals and instruction from the environment. Training such a robot is difficult especially due to the understanding of the instruction and the complicated environment. Previous…

Artificial Intelligence · Computer Science 2023-05-12 Kairui Zhou

Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation

Advances in learning and representations have reinvigorated work that connects language to other modalities. A particularly exciting direction is Vision-and-Language Navigation(VLN), in which agents interpret natural language instructions…

Artificial Intelligence · Computer Science 2019-06-24 Vihan Jain , Gabriel Magalhaes , Alexander Ku , Ashish Vaswani , Eugene Ie , Jason Baldridge

Anticipating the Unseen Discrepancy for Vision and Language Navigation

Vision-Language Navigation requires the agent to follow natural language instructions to reach a specific target. The large discrepancy between seen and unseen environments makes it challenging for the agent to generalize well. Previous…

Computer Vision and Pattern Recognition · Computer Science 2022-09-13 Yujie Lu , Huiliang Zhang , Ping Nie , Weixi Feng , Wenda Xu , Xin Eric Wang , William Yang Wang

SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation

Natural language instructions for visual navigation often use scene descriptions (e.g., "bedroom") and object references (e.g., "green chairs") to provide a breadcrumb trail to a goal location. This work presents a transformer-based…

Computer Vision and Pattern Recognition · Computer Science 2021-10-28 Abhinav Moudgil , Arjun Majumdar , Harsh Agrawal , Stefan Lee , Dhruv Batra

Learning Goal-Oriented Vision-and-Language Navigation with Self-Improving Demonstrations at Scale

Goal-oriented vision-language navigation requires robust exploration capabilities for agents to navigate to specified goals in unknown environments without step-by-step instructions. Existing methods tend to exclusively utilize…

Computer Vision and Pattern Recognition · Computer Science 2026-03-19 Songze Li , Zun Wang , Gengze Zhou , Jialu Li , Xiangyu Zeng , Ziyang Gong , Limin Wang , Yu Qiao , Qi Wu , Mohit Bansal , Yi Wang