English
Related papers

Related papers: Semantic Audio-Visual Navigation

200 papers

Audio-visual navigation enables embodied agents to navigate toward sound-emitting targets by leveraging both auditory and visual cues. However, most existing approaches rely on precomputed room impulse responses (RIRs) for binaural audio…

Computer Vision and Pattern Recognition · Computer Science 2026-04-02 Yichen Zeng , Hebaixu Wang , Meng Liu , Yu Zhou , Chen Gao , Kehan Chen , Gongping Huang

In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e.g., a phone ringing in another room). Existing models learn to act at a fixed…

Computer Vision and Pattern Recognition · Computer Science 2021-02-12 Changan Chen , Sagnik Majumder , Ziad Al-Halah , Ruohan Gao , Santhosh Kumar Ramakrishnan , Kristen Grauman

Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment. We introduce audio-visual navigation for complex, acoustically and…

Computer Vision and Pattern Recognition · Computer Science 2020-08-25 Changan Chen , Unnat Jain , Carl Schissler , Sebastia Vicenc Amengual Gari , Ziad Al-Halah , Vamsi Krishna Ithapu , Philip Robinson , Kristen Grauman

Recent work on audio-visual navigation targets a single static sound in noise-free audio environments and struggles to generalize to unheard sounds. We introduce the novel dynamic audio-visual navigation benchmark in which an embodied AI…

Computer Vision and Pattern Recognition · Computer Science 2022-01-13 Abdelrahman Younes

Audio-visual navigation task requires an agent to find a sound source in a realistic, unmapped 3D environment by utilizing egocentric audio-visual observations. Existing audio-visual navigation works assume a clean environment that solely…

Sound · Computer Science 2022-02-23 Yinfeng Yu , Wenbing Huang , Fuchun Sun , Changan Chen , Yikai Wang , Xiaohong Liu

We introduce the visual acoustic matching task, in which an audio clip is transformed to sound like it was recorded in a target environment. Given an image of the target environment and a waveform for the source audio, the goal is to…

Computer Vision and Pattern Recognition · Computer Science 2022-06-15 Changan Chen , Ruohan Gao , Paul Calamia , Kristen Grauman

Audio-visual navigation combines sight and hearing to navigate to a sound-emitting source in an unmapped environment. While recent approaches have demonstrated the benefits of audio input to detect and find the goal, they focus on clean and…

Sound · Computer Science 2023-01-04 Abdelrahman Younes , Daniel Honerkamp , Tim Welschehold , Abhinav Valada

Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Arun Balajee Vasudevan , Dengxin Dai , Luc Van Gool

Audio-visual Navigation refers to an agent utilizing visual and auditory information in complex 3D environments to accomplish target localization and path planning, thereby achieving autonomous navigation. The core challenge of this task…

Sound · Computer Science 2026-04-06 Xinyu Zhou , Yinfeng Yu

We consider the problem of object goal navigation in unseen environments. Solving this problem requires learning of contextual semantic priors, a challenging endeavour given the spatial and semantic variability of indoor environments.…

Computer Vision and Pattern Recognition · Computer Science 2022-03-10 Georgios Georgakis , Bernadette Bucher , Karl Schmeckpeper , Siddharth Singh , Kostas Daniilidis

A crucial ability of mobile intelligent agents is to integrate the evidence from multiple sensory inputs in an environment and to make a sequence of actions to reach their goals. In this paper, we attempt to approach the problem of…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Chuang Gan , Yiwei Zhang , Jiajun Wu , Boqing Gong , Joshua B. Tenenbaum

In this paper our objectives are, first, networks that can embed audio and visual inputs into a common space that is suitable for cross-modal retrieval; and second, a network that can localize the object that sounds in an image, given the…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Relja Arandjelović , Andrew Zisserman

We explore active audio-visual separation for dynamic sound sources, where an embodied agent moves intelligently in a 3D environment to continuously isolate the time-varying audio stream being emitted by an object of interest. The agent…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Sagnik Majumder , Kristen Grauman

Over the past few years, there has been a great deal of research on navigation tasks in indoor environments using deep reinforcement learning agents. Most of these tasks use only visual information in the form of first-person images to…

Computer Vision and Pattern Recognition · Computer Science 2023-08-02 Haru Kondoh , Asako Kanezaki

Visual-audio navigation (VAN) is attracting more and more attention from the robotic community due to its broad applications, \emph{e.g.}, household robots and rescue robots. In this task, an embodied agent must search for and navigate to…

Robotics · Computer Science 2023-06-22 Hongcheng Wang , Yuxuan Wang , Fangwei Zhong , Mingdong Wu , Jianwei Zhang , Yizhou Wang , Hao Dong

Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene…

Sound · Computer Science 2022-03-01 Dengxin Dai , Arun Balajee Vasudevan , Jiri Matas , Luc Van Gool

Audio-Visual Embodied Navigation aims to enable agents to autonomously navigate to sound sources in unknown 3D environments using auditory cues. While current AVN methods excel on in-distribution sound sources, they exhibit poor…

Sound · Computer Science 2025-10-15 Yi Wang , Yinfeng Yu , Fuchun Sun , Liejun Wang , Wendong Zheng

In audio-visual navigation (AVN), an intelligent agent needs to navigate to a constantly sound-making object in complex 3D environments based on its audio and visual perceptions. While existing methods attempt to improve the navigation…

Sound · Computer Science 2022-06-02 Shunqi Mao , Chaoyi Zhang , Heng Wang , Weidong Cai

Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks. We…

Sound · Computer Science 2023-11-02 Bandhav Veluri , Malek Itani , Justin Chan , Takuya Yoshioka , Shyamnath Gollakota

The Audio-Visual Segmentation (AVS) task aims to segment sounding objects in the visual space using audio cues. However, in this work, it is recognized that previous AVS methods show a heavy reliance on detrimental segmentation preferences…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Yaoting Wang , Peiwen Sun , Yuanchao Li , Honggang Zhang , Di Hu
‹ Prev 1 2 3 10 Next ›