Related papers: Spatial Aware Multi-Task Learning Based Speech Sep…
This paper introduces an area-based source separation method designed for virtual meeting scenarios. The aim is to preserve speech signals from an unspecified number of sources within a defined spatial area in front of a linear microphone…
A three-stage approach is proposed for speaker counting and speech separation in noisy and reverberant environments. In the spatial feature extraction, a spatial coherence matrix (SCM) is computed using whitened relative transfer functions…
Speech separation with several speakers is a challenging task because of the non-stationarity of the speech and the strong signal similarity between interferent sources. Current state-of-the-art solutions can separate well the different…
In this paper, we present a novel multi-channel speech extraction system to simultaneously extract multiple clean individual sources from a mixture in noisy and reverberant environments. The proposed method is built on an improved…
Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic…
This paper addresses the challenge of audio-visual single-microphone speech separation and enhancement in the presence of real-world environmental noise. Our approach is based on generative inverse sampling, where we model clean speech and…
We consider the problem of audio voice separation for binaural applications, such as earphones and hearing aids. While today's neural networks perform remarkably well (separating $4+$ sources with 2 microphones) they assume a known or fixed…
We consider the task of region-based source separation of reverberant multi-microphone recordings. We assume pre-defined spatial regions with a single active source per region. The objective is to estimate the signals from the individual…
Speaker Diarization (SD) aims at grouping speech segments that belong to the same speaker. This task is required in many speech-processing applications, such as rich meeting transcription. In this context, distant microphone arrays usually…
Speech data collected in real-world scenarios often encounters two issues. First, multiple sources may exist simultaneously, and the number of sources may vary with time. Second, the existence of background noise in recording is inevitable.…
Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks. We…
This paper proposes a neural network based speech separation method using spatially distributed microphones. Unlike with traditional microphone array settings, neither the number of microphones nor their spatial arrangement is known in…
Speech separation seeks to isolate individual speech signals from a multi-talk speech mixture. Despite much progress, a system well-trained on synthetic data often experiences performance degradation on out-of-domain data, such as…
We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment. The agent hears multiple audio sources…
Teleconferencing is becoming essential during the COVID-19 pandemic. However, in real-world applications, speech quality can deteriorate due to, for example, background interference, noise, or reverberation. To solve this problem, target…
This paper addresses the issue of active speaker detection (ASD) in noisy environments and formulates a robust active speaker detection (rASD) problem. Existing ASD approaches leverage both audio and visual modalities, but non-speech sounds…
In this paper we describe a speaker diarization system that enables localization and identification of all speakers present in a conversation or meeting. We propose a novel systematic approach to tackle several long-standing challenges in…
Speech activity detection (SAD), which often rests on the fact that the noise is "more" stationary than speech, is particularly challenging in non-stationary environments, because the time variance of the acoustic scene makes it difficult…
Auditory scene analysis (ASA) aims to retrieve information from the acoustic environment, by carrying out three main tasks: sound source location, separation, and classification. These tasks are traditionally executed with a linear data…
Speech separation involves extracting an individual speaker's voice from a multi-speaker audio signal. The increasing complexity of real-world environments, where multiple speakers might converse simultaneously, underscores the importance…