English
Related papers

Related papers: Multi-scenario deep learning for multi-speaker sou…

200 papers

Recently, there has been growing interest in multi-speaker speech recognition, where the utterances of multiple speakers are recognized from their mixture. Promising techniques have been proposed for this task, but earlier works have…

Sound · Computer Science 2018-05-16 Hiroshi Seki , Takaaki Hori , Shinji Watanabe , Jonathan Le Roux , John R. Hershey

Speech separation with several speakers is a challenging task because of the non-stationarity of the speech and the strong signal similarity between interferent sources. Current state-of-the-art solutions can separate well the different…

Signal Processing · Electrical Eng. & Systems 2021-02-09 Nicolas Furnon , Romain Serizel , Irina Illina , Slim Essid

Single-channel speech separation is a crucial task for enhancing speech recognition systems in multi-speaker environments. This paper investigates the robustness of state-of-the-art Neural Network models in scenarios where the pitch…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-23 Bunlong Lay , Sebastian Zaczek , Kristina Tesch , Timo Gerkmann

This paper addresses the challenge of speaker separation, which remains an active research topic despite the promising results achieved in recent years. These results, however, often degrade in real recording conditions due to the presence…

Sound · Computer Science 2024-11-14 Rawad Melhem , Assef Jafar , Oumayma Al Dakkak

Despite the overwhelming success of deep learning in various speech processing tasks, the problem of separating simultaneous speakers in a mixture remains challenging. Two major difficulties in such systems are the arbitrary source…

Sound · Computer Science 2017-11-30 Zhuo Chen , Yi Luo , Nima Mesgarani

Lately there have been novel developments in deep learning towards solving the cocktail party problem. Initial results are very promising and allow for more research in the domain. One technique that has not yet been explored in the neural…

Sound · Computer Science 2017-08-30 Jeroen Zegers , Hugo Van hamme

We introduce a sophisticated multi-speaker speech data simulator, specifically engineered to generate multi-speaker speech recordings. A notable feature of this simulator is its capacity to modulate the distribution of silence and overlap…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-20 Tae Jin Park , He Huang , Coleman Hooper , Nithin Koluguri , Kunal Dhawan , Ante Jukic , Jagadeesh Balam , Boris Ginsburg

This paper examines the applicability in realistic scenarios of two deep learning based solutions to the overlapping speaker separation problem. Firstly, we present experiments that show that these methods are applicable for a broad range…

Machine Learning · Computer Science 2019-12-20 Pieter Appeltans , Jeroen Zegers , Hugo Van hamme

The task of estimating the maximum number of concurrent speakers from single channel mixtures is important for various audio-based applications, such as blind source separation, speaker diarisation, audio surveillance or auditory scene…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-05 Fabian-Robert Stöter , Soumitro Chakrabarty , Bernd Edler , Emanuël A. P. Habets

Real-world sound scenes consist of time-varying collections of sound sources, each generating characteristic sound events that are mixed together in audio recordings. The association of these constituent sound events with their mixture and…

Recent progress in generative AI has made it increasingly easy to create natural-sounding deepfake speech from just a few seconds of audio. While these tools support helpful applications, they also raise serious concerns by making it…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-07 Xi Xuan , Yang Xiao , Rohan Kumar Das , Tomi Kinnunen

Building a persona-based conversation agent is challenging owing to the lack of large amounts of speaker-specific conversation data for model training. This paper addresses the problem by proposing a multi-task learning approach to training…

Computation and Language · Computer Science 2017-10-23 Yi Luan , Chris Brockett , Bill Dolan , Jianfeng Gao , Michel Galley

Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determine identity information when there are…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-07 Hyewon Han , Soo-Whan Chung , Hong-Goo Kang

Speech separation has been extensively explored to tackle the cocktail party problem. However, these studies are still far from having enough generalization capabilities for real scenarios. In this work, we raise a common strategy named…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-26 Jing Shi , Jiaming Xu , Yusuke Fujita , Shinji Watanabe , Bo Xu

The recent advances in deep learning are mostly driven by availability of large amount of training data. However, availability of such data is not always possible for specific tasks such as speaker recognition where collection of large…

Audio and Speech Processing · Electrical Eng. & Systems 2019-04-19 Prashant Anand , Ajeet Kumar Singh , Siddharth Srivastava , Brejesh Lall

Speaker individuality information is among the most critical elements within speech signals. By thoroughly and accurately modeling this information, it can be utilized in various intelligent speech applications, such as speaker recognition,…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-05 Shuai Wang , Zhengyang Chen , Kong Aik Lee , Yanmin Qian , Haizhou Li

Multi-channel acoustic signal processing is a well-established and powerful tool to exploit the spatial diversity between a target signal and non-target or noise sources for signal enhancement. However, the textbook solutions for optimal…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-14 Reinhold Haeb-Umbach , Tomohiro Nakatani , Marc Delcroix , Christoph Boeddeker , Tsubasa Ochiai

Deep learning models have been used for a wide variety of tasks. They are prevalent in computer vision, natural language processing, speech recognition, and other areas. While these models have worked well under many scenarios, it has been…

Machine Learning · Computer Science 2022-02-15 Daniel Steinberg , Paul Munro

End-to-end speaker diarization enables accurate overlap-aware diarization by jointly estimating multiple speakers' speech activities in parallel. This approach is data-hungry, requiring a large amount of labeled conversational data, which…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-02 Shota Horiguchi , Atsushi Ando , Marc Delcroix , Naohiro Tawara

Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-29 Tae Jin Park , Naoyuki Kanda , Dimitrios Dimitriadis , Kyu J. Han , Shinji Watanabe , Shrikanth Narayanan
‹ Prev 1 2 3 10 Next ›