Related papers: Spatial Aware Multi-Task Learning Based Speech Sep…

Efficient Area-based and Speaker-Agnostic Source Separation

This paper introduces an area-based source separation method designed for virtual meeting scenarios. The aim is to preserve speech signals from an unspecified number of sources within a defined spatial area in front of a linear microphone…

Audio and Speech Processing · Electrical Eng. & Systems 2024-08-20 Martin Strauss , Okan Köpüklü

Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence

A three-stage approach is proposed for speaker counting and speech separation in noisy and reverberant environments. In the spatial feature extraction, a spatial coherence matrix (SCM) is computed using whitened relative transfer functions…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-08 Yicheng Hsu , Mingsian Bai

Distributed speech separation in spatially unconstrained microphone arrays

Speech separation with several speakers is a challenging task because of the non-stationarity of the speech and the strong signal similarity between interferent sources. Current state-of-the-art solutions can separate well the different…

Signal Processing · Electrical Eng. & Systems 2021-02-09 Nicolas Furnon , Romain Serizel , Irina Illina , Slim Essid

Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism

In this paper, we present a novel multi-channel speech extraction system to simultaneously extract multiple clean individual sources from a mixture in noisy and reverberant environments. The proposed method is built on an improved…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-17 Jisi Zhang , Catalin Zorila , Rama Doddipatla , Jon Barker

Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds

Humans can robustly recognize and localize objects by integrating visual and auditory cues. While machines are able to do the same now with images, less work has been done with sounds. This work develops an approach for dense semantic…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Arun Balajee Vasudevan , Dengxin Dai , Luc Van Gool

SSNAPS: Audio-Visual Separation of Speech and Background Noise with Diffusion Inverse Sampling

This paper addresses the challenge of audio-visual single-microphone speech separation and enhancement in the presence of real-world environmental noise. Our approach is based on generative inverse sampling, where we model clean speech and…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-03 Yochai Yemini , Yoav Ellinson , Rami Ben-Ari , Sharon Gannot , Ethan Fetaya

Learning to Separate Voices by Spatial Regions

We consider the problem of audio voice separation for binaural applications, such as earphones and hearing aids. While today's neural networks perform remarkably well (separating $4+$ sources with 2 microphones) they assume a known or fixed…

Sound · Computer Science 2022-07-18 Zhongweiyang Xu , Romit Roy Choudhury

Multi-Microphone Speaker Separation by Spatial Regions

We consider the task of region-based source separation of reverberant multi-microphone recordings. We assume pre-defined spatial regions with a single active source per region. The objective is to estimate the signals from the individual…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-14 Julian Wechsler , Srikanth Raj Chetupalli , Wolfgang Mack , Emanuël A. P. Habets

ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings

Speaker Diarization (SD) aims at grouping speech segments that belong to the same speaker. This task is required in many speech-processing applications, such as rich meeting transcription. In this context, distant microphone arrays usually…

Sound · Computer Science 2024-06-06 Theo Mariotte , Anthony Larcher , Silvio Montresor , Jean-Hugh Thomas

SADDEL: Joint Speech Separation and Denoising Model based on Multitask Learning

Speech data collected in real-world scenarios often encounters two issues. First, multiple sources may exist simultaneously, and the number of sources may vary with time. Second, the existence of background noise in recording is inevitable.…

Sound · Computer Science 2020-05-21 Yuan-Kuei Wu , Chao-I Tuan , Hung-yi Lee , Yu Tsao

Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables

Imagine being able to listen to the birds chirping in a park without hearing the chatter from other hikers, or being able to block out traffic noise on a busy street while still being able to hear emergency sirens and car honks. We…

Sound · Computer Science 2023-11-02 Bandhav Veluri , Malek Itani , Justin Chan , Takuya Yoshioka , Shyamnath Gollakota

Neural Speech Separation Using Spatially Distributed Microphones

This paper proposes a neural network based speech separation method using spatially distributed microphones. Unlike with traditional microphone array settings, neither the number of microphones nor their spatial arrangement is known in…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-01 Dongmei Wang , Zhuo Chen , Takuya Yoshioka

Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation

Speech separation seeks to isolate individual speech signals from a multi-talk speech mixture. Despite much progress, a system well-trained on synthetic data often experiences performance degradation on out-of-domain data, such as…

Sound · Computer Science 2025-03-18 Wupeng Wang , Zexu Pan , Jingru Lin , Shuai Wang , Haizhou Li

Move2Hear: Active Audio-Visual Source Separation

We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment. The agent hears multiple audio sources…

Computer Vision and Pattern Recognition · Computer Science 2021-08-27 Sagnik Majumder , Ziad Al-Halah , Kristen Grauman

Learning-based personal speech enhancement for teleconferencing by exploiting spatial-spectral features

Teleconferencing is becoming essential during the COVID-19 pandemic. However, in real-world applications, speech quality can deteriorate due to, for example, background interference, noise, or reverberation. To solve this problem, target…

Audio and Speech Processing · Electrical Eng. & Systems 2022-05-02 Yicheng Hsu , Yonghan Lee , Mingsian R. Bai

Robust Active Speaker Detection in Noisy Environments

This paper addresses the issue of active speaker detection (ASD) in noisy environments and formulates a robust active speaker detection (rASD) problem. Existing ASD approaches leverage both audio and visual modalities, but non-speech sounds…

Multimedia · Computer Science 2024-04-02 Siva Sai Nagender Vasireddy , Chenxu Zhang , Xiaohu Guo , Yapeng Tian

A Real-time Speaker Diarization System Based on Spatial Spectrum

In this paper we describe a speaker diarization system that enables localization and identification of all speakers present in a conversation or meeting. We propose a novel systematic approach to tackle several long-standing challenges in…

Sound · Computer Science 2021-07-21 Siqi Zheng , Weilong Huang , Xianliang Wang , Hongbin Suo , Jinwei Feng , Zhijie Yan

Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments

Speech activity detection (SAD), which often rests on the fact that the noise is "more" stationary than speech, is particularly challenging in non-stationary environments, because the time variance of the acoustic scene makes it difficult…

Audio and Speech Processing · Electrical Eng. & Systems 2020-07-29 Jens Heitkaemper , Joerg Schmalenstroeer , Reinhold Haeb-Umbach

Multi-agent Auditory Scene Analysis

Auditory scene analysis (ASA) aims to retrieve information from the acoustic environment, by carrying out three main tasks: sound source location, separation, and classification. These tasks are traditionally executed with a linear data…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-21 Caleb Rascon , Luis Gato-Diaz , Eduardo García-Alarcón

Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments

Speech separation involves extracting an individual speaker's voice from a multi-speaker audio signal. The increasing complexity of real-world environments, where multiple speakers might converse simultaneously, underscores the importance…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-09 Renana Opochinsky , Mordehay Moradi , Sharon Gannot