English
Related papers

Related papers: Move2Hear: Active Audio-Visual Source Separation

200 papers

We explore active audio-visual separation for dynamic sound sources, where an embodied agent moves intelligently in a 3D environment to continuously isolate the time-varying audio stream being emitted by an object of interest. The agent…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Sagnik Majumder , Kristen Grauman

In audio-visual navigation, an agent intelligently travels through a complex, unmapped 3D environment using both sights and sounds to find a sound source (e.g., a phone ringing in another room). Existing models learn to act at a fixed…

Computer Vision and Pattern Recognition · Computer Science 2021-02-12 Changan Chen , Sagnik Majumder , Ziad Al-Halah , Ruohan Gao , Santhosh Kumar Ramakrishnan , Kristen Grauman

The objective of this paper is to perform audio-visual sound source separation, i.e.~to separate component audios from a mixture based on the videos of sound sources. Moreover, we aim to pinpoint the source location in the input video…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Lingyu Zhu , Esa Rahtu

Learning how objects sound from video is challenging, since they often heavily overlap in a single audio channel. Current methods for visually-guided audio source separation sidestep the issue by training with artificially mixed video…

Computer Vision and Pattern Recognition · Computer Science 2019-08-22 Ruohan Gao , Kristen Grauman

Audio-visual navigation task requires an agent to find a sound source in a realistic, unmapped 3D environment by utilizing egocentric audio-visual observations. Existing audio-visual navigation works assume a clean environment that solely…

Sound · Computer Science 2022-02-23 Yinfeng Yu , Wenbing Huang , Fuchun Sun , Changan Chen , Yikai Wang , Xiaohong Liu

We consider the problem of audio voice separation for binaural applications, such as earphones and hearing aids. While today's neural networks perform remarkably well (separating $4+$ sources with 2 microphones) they assume a known or fixed…

Sound · Computer Science 2022-07-18 Zhongweiyang Xu , Romit Roy Choudhury

Speech separation aims to separate individual voice from an audio mixture of multiple simultaneous talkers. Although audio-only approaches achieve satisfactory performance, they build on a strategy to handle the predefined conditions,…

Sound · Computer Science 2020-12-01 Peng Zhang , Jiaming Xu , Jing shi , Yunzhe Hao , Bo Xu

A crucial ability of mobile intelligent agents is to integrate the evidence from multiple sensory inputs in an environment and to make a sequence of actions to reach their goals. In this paper, we attempt to approach the problem of…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Chuang Gan , Yiwei Zhang , Jiajun Wu , Boqing Gong , Joshua B. Tenenbaum

Moving around in the world is naturally a multisensory experience, but today's embodied agents are deaf---restricted to solely their visual perception of the environment. We introduce audio-visual navigation for complex, acoustically and…

Computer Vision and Pattern Recognition · Computer Science 2020-08-25 Changan Chen , Unnat Jain , Carl Schissler , Sebastia Vicenc Amengual Gari , Ziad Al-Halah , Vamsi Krishna Ithapu , Philip Robinson , Kristen Grauman

In this work we apply deep reinforcement learning to the problems of navigating a three-dimensional environment and inferring the locations of human speaker audio sources within, in the case where the only available information is the raw…

Sound · Computer Science 2021-11-30 Petros Giannakopoulos , Aggelos Pikrakis , Yannis Cotronis

Isolating the voice of a specific person while filtering out other voices or background noises is challenging when video is shot in noisy environments. We propose audio-visual methods to isolate the voice of a single speaker and eliminate…

Computer Vision and Pattern Recognition · Computer Science 2018-02-13 Aviv Gabbay , Ariel Ephrat , Tavi Halperin , Shmuel Peleg

This paper introduces an area-based source separation method designed for virtual meeting scenarios. The aim is to preserve speech signals from an unspecified number of sources within a defined spatial area in front of a linear microphone…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-20 Martin Strauss , Okan Köpüklü

Audio-visual navigation combines sight and hearing to navigate to a sound-emitting source in an unmapped environment. While recent approaches have demonstrated the benefits of audio input to detect and find the goal, they focus on clean and…

Sound · Computer Science 2023-01-04 Abdelrahman Younes , Daniel Honerkamp , Tim Welschehold , Abhinav Valada

Perceiving a scene most fully requires all the senses. Yet modeling how objects look and sound is challenging: most natural scenes and events contain multiple objects, and the audio track mixes all the sound sources together. We propose to…

Computer Vision and Pattern Recognition · Computer Science 2018-07-27 Ruohan Gao , Rogerio Feris , Kristen Grauman

Under noisy conditions, automatic speech recognition (ASR) can greatly benefit from the addition of visual signals coming from a video of the speaker's face. However, when multiple candidate speakers are visible this traditionally requires…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-12 Otavio Braga , Olivier Siohan

Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources. Traditionally, these tasks have been…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-16 Daniel Michelsanti , Zheng-Hua Tan , Shi-Xiong Zhang , Yong Xu , Meng Yu , Dong Yu , Jesper Jensen

Active speaker detection and speech enhancement have become two increasingly attractive topics in audio-visual scenario understanding. According to their respective characteristics, the scheme of independently designed architecture has been…

Sound · Computer Science 2022-07-08 Junwen Xiong , Yu Zhou , Peng Zhang , Lei Xie , Wei Huang , Yufei Zha

In this work we use deep reinforcement learning to create an autonomous agent that can navigate in a two-dimensional space using only raw auditory sensory information from the environment, a problem that has received very little attention…

Sound · Computer Science 2021-05-17 Petros Giannakopoulos , Aggelos Pikrakis , Yannis Cotronis

Audio-visual automatic speech recognition is a promising approach to robust ASR under noisy conditions. However, up until recently it had been traditionally studied in isolation assuming the video of a single speaking face matches the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-13 Otavio Braga , Olivier Siohan

Segmenting objects in images and separating sound sources in audio are challenging tasks, in part because traditional approaches require large amounts of labeled data. In this paper we develop a neural network model for visual object…

Computer Vision and Pattern Recognition · Computer Science 2019-04-22 Andrew Rouditchenko , Hang Zhao , Chuang Gan , Josh McDermott , Antonio Torralba
‹ Prev 1 2 3 10 Next ›