Related papers: Multi-Decoder DPRNN: High Accuracy Source Counting…

Recursive speech separation for unknown number of speakers

In this paper we propose a method of single-channel speaker-independent multi-speaker speech separation for an unknown number of speakers. As opposed to previous works, in which the number of speakers is assumed to be known in advance and…

Sound · Computer Science 2019-09-04 Naoya Takahashi , Sudarsanam Parthasaarathy , Nabarun Goswami , Yuki Mitsufuji

Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation

The task of estimating the maximum number of concurrent speakers from single channel mixtures is important for various audio-based applications, such as blind source separation, speaker diarisation, audio surveillance or auditory scene…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-05 Fabian-Robert Stöter , Soumitro Chakrabarty , Bernd Edler , Emanuël A. P. Habets

Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR

Most approaches to multi-talker overlapped speech separation and recognition assume that the number of simultaneously active speakers is given, but in realistic situations, it is typically unknown. To cope with this, we extend an iterative…

Audio and Speech Processing · Electrical Eng. & Systems 2020-12-22 Thilo von Neumann , Christoph Boeddeker , Lukas Drude , Keisuke Kinoshita , Marc Delcroix , Tomohiro Nakatani , Reinhold Haeb-Umbach

Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers

The vast majority of speech separation methods assume that the number of speakers is known in advance, hence they are specific to the number of speakers. By contrast, a more realistic and challenging task is to separate a mixture in which…

Sound · Computer Science 2022-03-31 Zhenhao Jin , Xiang Hao , Xiangdong Su

Single channel voice separation for unknown number of speakers under reverberant and noisy settings

We present a unified network for voice separation of an unknown number of speakers. The proposed approach is composed of several separation heads optimized together with a speaker classification branch. The separation is carried out in the…

Sound · Computer Science 2020-11-05 Shlomo E. Chazan , Lior Wolf , Eliya Nachmani , Yossi Adi

Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers

This paper addresses the problem of single-channel speech separation, where the number of speakers is unknown, and each speaker may speak multiple utterances. We propose a speech separation model that simultaneously performs separation,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-23 Yuzhu Wang , Archontis Politis , Konstantinos Drossos , Tuomas Virtanen

Improved Source Counting and Separation for Monaural Mixture

Single-channel speech separation in time domain and frequency domain has been widely studied for voice-driven applications over the past few years. Most of previous works assume known number of speakers in advance, however, which is not…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-02 Yiming Xiao , Haijian Zhang

Speaker and Direction Inferred Dual-channel Speech Separation

Most speech separation methods, trying to separate all channel sources simultaneously, are still far from having enough general- ization capabilities for real scenarios where the number of input sounds is usually uncertain and even dynamic.…

Sound · Computer Science 2021-02-09 Chenxing Li , Jiaming Xu , Nima Mesgarani , Bo Xu

Speaker Diarization: Using Recurrent Neural Networks

Speaker Diarization is the problem of separating speakers in an audio. There could be any number of speakers and final result should state when speaker starts and ends. In this project, we analyze given audio file with 2 channels and 2…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-11 Vishal Sharma , Zekun Zhang , Zachary Neubert , Curtis Dyreson

Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters

In a multi-channel separation task with multiple speakers, we aim to recover all individual speech signals from the mixture. In contrast to single-channel approaches, which rely on the different spectro-temporal characteristics of the…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-11 Kristina Tesch , Timo Gerkmann

Towards Real-Time Single-Channel Speech Separation in Noisy and Reverberant Environments

Real-time single-channel speech separation aims to unmix an audio stream captured from a single microphone that contains multiple people talking at once, environmental noise, and reverberation into multiple de-reverberated and noise-free…

Audio and Speech Processing · Electrical Eng. & Systems 2023-04-18 Julian Neri , Sebastian Braun

Unsupervised Single-Channel Audio Separation with Diffusion Source Priors

Single-channel audio separation aims to separate individual sources from a single-channel mixture. Most existing methods rely on supervised learning with synthetically generated paired data. However, obtaining high-quality paired data in…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-24 Runwu Shi , Chang Li , Jiang Wang , Rui Zhang , Nabeela Khan , Benjamin Yen , Takeshi Ashizawa , Kazuhiro Nakadai

Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model

In typical multi-talker speech recognition systems, a neural network-based acoustic model predicts senone state posteriors for each speaker. These are later used by a single-talker decoder which is applied on each speaker-specific output…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-18 Martin Kocour , Kateřina Žmolíková , Lucas Ondel , Ján Švec , Marc Delcroix , Tsubasa Ochiai , Lukáš Burget , Jan Černocký

High-Resolution Speaker Counting In Reverberant Rooms Using CRNN With Ambisonics Features

Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the…

Sound · Computer Science 2020-03-18 Pierre-Amaury Grumiaux , Srdjan Kitic , Laurent Girin , Alexandre Guérin

A Unified Framework for Speech Separation

Speech separation refers to extracting each individual speech source in a given mixed signal. Recent advancements in speech separation and ongoing research in this area, have made these approaches as promising techniques for pre-processing…

Machine Learning · Computer Science 2019-12-18 Fahimeh Bahmaninezhad , Shi-Xiong Zhang , Yong Xu , Meng Yu , John H. L. Hansen , Dong Yu

Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor

We propose a novel speech separation model designed to separate mixtures with an unknown number of speakers. The proposed model stacks 1) a dual-path processing block that can model spectro-temporal patterns, 2) a transformer decoder-based…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-24 Younglo Lee , Shukjae Choi , Byeong-Yeol Kim , Zhong-Qiu Wang , Shinji Watanabe

Voice Separation with an Unknown Number of Multiple Speakers

We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-02 Eliya Nachmani , Yossi Adi , Lior Wolf

SepIt: Approaching a Single Channel Speech Separation Bound

We present an upper bound for the Single Channel Speech Separation task, which is based on an assumption regarding the nature of short segments of speech. Using the bound, we are able to show that while the recent methods have made…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-23 Shahar Lutati , Eliya Nachmani , Lior Wolf

Distributed speech separation in spatially unconstrained microphone arrays

Speech separation with several speakers is a challenging task because of the non-stationarity of the speech and the strong signal similarity between interferent sources. Current state-of-the-art solutions can separate well the different…

Signal Processing · Electrical Eng. & Systems 2021-02-09 Nicolas Furnon , Romain Serizel , Irina Illina , Slim Essid

Dual-Path Filter Network: Speaker-Aware Modeling for Speech Separation

Speech separation has been extensively studied to deal with the cocktail party problem in recent years. All related approaches can be divided into two categories: time-frequency domain methods and time domain methods. In addition, some…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-31 Fan-Lin Wang , Yu-Huai Peng , Hung-Shin Lee , Hsin-Min Wang