Related papers: Multi-scenario deep learning for multi-speaker sou…

A Purely End-to-end System for Multi-speaker Speech Recognition

Recently, there has been growing interest in multi-speaker speech recognition, where the utterances of multiple speakers are recognized from their mixture. Promising techniques have been proposed for this task, but earlier works have…

Sound · Computer Science 2018-05-16 Hiroshi Seki , Takaaki Hori , Shinji Watanabe , Jonathan Le Roux , John R. Hershey

Distributed speech separation in spatially unconstrained microphone arrays

Speech separation with several speakers is a challenging task because of the non-stationarity of the speech and the strong signal similarity between interferent sources. Current state-of-the-art solutions can separate well the different…

Signal Processing · Electrical Eng. & Systems 2021-02-09 Nicolas Furnon , Romain Serizel , Irina Illina , Slim Essid

Robustness of Speech Separation Models for Similar-pitch Speakers

Single-channel speech separation is a crucial task for enhancing speech recognition systems in multi-speaker environments. This paper investigates the robustness of state-of-the-art Neural Network models in scenarios where the pitch…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-23 Bunlong Lay , Sebastian Zaczek , Kristina Tesch , Timo Gerkmann

Developing an Effective Training Dataset to Enhance the Performance of AI-based Speaker Separation Systems

This paper addresses the challenge of speaker separation, which remains an active research topic despite the promising results achieved in recent years. These results, however, often degrade in real recording conditions due to the presence…

Sound · Computer Science 2024-11-14 Rawad Melhem , Assef Jafar , Oumayma Al Dakkak

Deep attractor network for single-microphone speaker separation

Despite the overwhelming success of deep learning in various speech processing tasks, the problem of separating simultaneous speakers in a mixture remains challenging. Two major difficulties in such systems are the arbitrary source…

Sound · Computer Science 2017-11-30 Zhuo Chen , Yi Luo , Nima Mesgarani

Improving Source Separation via Multi-Speaker Representations

Lately there have been novel developments in deep learning towards solving the cocktail party problem. Initial results are very promising and allow for more research in the domain. One technique that has not yet been explored in the neural…

Sound · Computer Science 2017-08-30 Jeroen Zegers , Hugo Van hamme

Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation

We introduce a sophisticated multi-speaker speech data simulator, specifically engineered to generate multi-speaker speech recordings. A notable feature of this simulator is its capacity to modulate the distribution of silence and overlap…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-20 Tae Jin Park , He Huang , Coleman Hooper , Nithin Koluguri , Kunal Dhawan , Ante Jukic , Jagadeesh Balam , Boris Ginsburg

Practical applicability of deep neural networks for overlapping speaker separation

This paper examines the applicability in realistic scenarios of two deep learning based solutions to the overlapping speaker separation problem. Firstly, we present experiments that show that these methods are applicable for a broad range…

Machine Learning · Computer Science 2019-12-20 Pieter Appeltans , Jeroen Zegers , Hugo Van hamme

Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation

The task of estimating the maximum number of concurrent speakers from single channel mixtures is important for various audio-based applications, such as blind source separation, speaker diarisation, audio surveillance or auditory scene…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-05 Fabian-Robert Stöter , Soumitro Chakrabarty , Bernd Edler , Emanuël A. P. Habets

Self-Supervised Learning from Automatically Separated Sound Scenes

Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings. The association of these constituent sound events with their mixture and…

Sound · Computer Science 2021-09-16 Eduardo Fonseca , Aren Jansen , Daniel P. W. Ellis , Scott Wisdom , Marco Tagliasacchi , John R. Hershey , Manoj Plakal , Shawn Hershey , R. Channing Moore , Xavier Serra

Multilingual Source Tracing of Speech Deepfakes: A First Benchmark

Recent progress in generative AI has made it increasingly easy to create natural-sounding deepfake speech from just a few seconds of audio. While these tools support helpful applications, they also raise serious concerns by making it…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-07 Xi Xuan , Yang Xiao , Rohan Kumar Das , Tomi Kinnunen

Multi-Task Learning for Speaker-Role Adaptation in Neural Conversation Models

Building a persona-based conversation agent is challenging owing to the lack of large amounts of speaker-specific conversation data for model training. This paper addresses the problem by proposing a multi-task learning approach to training…

Computation and Language · Computer Science 2017-10-23 Yi Luan , Chris Brockett , Bill Dolan , Jianfeng Gao , Michel Galley

MIRNet: Learning multiple identities representations in overlapped speech

Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determine identity information when there are…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-07 Hyewon Han , Soo-Whan Chung , Hong-Goo Kang

Speaker-Conditional Chain Model for Speech Separation and Extraction

Speech separation has been extensively explored to tackle the cocktail party problem. However, these studies are still far from having enough generalization capabilities for real scenarios. In this work, we raise a common strategy named…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-26 Jing Shi , Jiaming Xu , Yusuke Fujita , Shinji Watanabe , Bo Xu

Few Shot Speaker Recognition using Deep Neural Networks

The recent advances in deep learning are mostly driven by availability of large amount of training data. However, availability of such data is not always possible for specific tasks such as speaker recognition where collection of large…

Audio and Speech Processing · Electrical Eng. & Systems 2019-04-19 Prashant Anand , Ajeet Kumar Singh , Siddharth Srivastava , Brejesh Lall

Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning

Speaker individuality information is among the most critical elements within speech signals. By thoroughly and accurately modeling this information, it can be utilized in various intelligent speech applications, such as speaker recognition,…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-05 Shuai Wang , Zhengyang Chen , Kong Aik Lee , Yanmin Qian , Haizhou Li

Microphone Array Signal Processing and Deep Learning for Speech Enhancement

Multi-channel acoustic signal processing is a well-established and powerful tool to exploit the spatial diversity between a target signal and non-target or noise sources for signal enhancement. However, the textbook solutions for optimal…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-14 Reinhold Haeb-Umbach , Tomohiro Nakatani , Marc Delcroix , Christoph Boeddeker , Tsubasa Ochiai

Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances

Deep learning models have been used for a wide variety of tasks. They are prevalent in computer vision, natural language processing, speech recognition, and other areas. While these models have worked well under many scenarios, it has been…

Machine Learning · Computer Science 2022-02-15 Daniel Steinberg , Paul Munro

Pretraining Multi-Speaker Identification for Neural Speaker Diarization

End-to-end speaker diarization enables accurate overlap-aware diarization by jointly estimating multiple speakers' speech activities in parallel. This approach is data-hungry, requiring a large amount of labeled conversational data, which…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-02 Shota Horiguchi , Atsushi Ando , Marc Delcroix , Naohiro Tawara

A Review of Speaker Diarization: Recent Advances with Deep Learning

Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-29 Tae Jin Park , Naoyuki Kanda , Dimitrios Dimitriadis , Kyu J. Han , Shinji Watanabe , Shrikanth Narayanan