Related papers: Improving Source Separation via Multi-Speaker Repr…

Speaker-independent Speech Separation with Deep Attractor Network

Despite the recent success of deep learning for many speech processing tasks, single-microphone, speaker-independent speech separation remains challenging for two main reasons. The first reason is the arbitrary order of the target and…

Sound · Computer Science 2018-04-19 Yi Luo , Zhuo Chen , Nima Mesgarani

Accent and Speaker Disentanglement in Many-to-many Voice Conversion

This paper proposes an interesting voice and accent joint conversion approach, which can convert an arbitrary source speaker's voice to a target speaker with non-native accent. This problem is challenging as each target speaker only has…

Sound · Computer Science 2020-11-18 Zhichao Wang , Wenshuo Ge , Xiong Wang , Shan Yang , Wendong Gan , Haitao Chen , Hai Li , Lei Xie , Xiulin Li

Monaural Audio Speaker Separation with Source Contrastive Estimation

We propose an algorithm to separate simultaneously speaking persons from each other, the "cocktail party problem", using a single microphone. Our approach involves a deep recurrent neural networks regression to a vector space that is…

Sound · Computer Science 2017-05-22 Cory Stephenson , Patrick Callier , Abhinav Ganesh , Karl Ni

Cracking the cocktail party problem by multi-beam deep attractor network

While recent progresses in neural network approaches to single-channel speech separation, or more generally the cocktail party problem, achieved significant improvement, their performance for complex mixtures is still not satisfactory. In…

Sound · Computer Science 2018-03-30 Zhuo Chen , Jinyu Li , Xiong Xiao , Takuya Yoshioka , Huaming Wang , Zhenghao Wang , Yifan Gong

Probabilistic Binary-Mask Cocktail-Party Source Separation in a Convolutional Deep Neural Network

Separation of competing speech is a key challenge in signal processing and a feat routinely performed by the human auditory brain. A long standing benchmark of the spectrogram approach to source separation is known as the ideal binary mask.…

Sound · Computer Science 2015-03-25 Andrew J. R. Simpson

Single-Channel Multi-Speaker Separation using Deep Clustering

Deep clustering is a recently introduced deep learning architecture that uses discriminatively trained embeddings as the basis for clustering. It was recently applied to spectrogram segmentation, resulting in impressive results on…

Machine Learning · Computer Science 2016-07-11 Yusuf Isik , Jonathan Le Roux , Zhuo Chen , Shinji Watanabe , John R. Hershey

Speaker Re-identification with Speaker Dependent Speech Enhancement

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Yanpei Shi , Qiang Huang , Thomas Hain

Speaker and Direction Inferred Dual-channel Speech Separation

Most speech separation methods, trying to separate all channel sources simultaneously, are still far from having enough general- ization capabilities for real scenarios where the number of input sounds is usually uncertain and even dynamic.…

Sound · Computer Science 2021-02-09 Chenxing Li , Jiaming Xu , Nima Mesgarani , Bo Xu

Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model

Cocktail party problem is the scenario where it is difficult to separate or distinguish individual speaker from a mixed speech from several speakers. There have been several researches going on in this field but the size and complexity of…

Sound · Computer Science 2026-02-19 S. Rijal , R. Neupane , S. P. Mainali , S. K. Regmi , S. Maharjan

Single-Microphone Speech Enhancement and Separation Using Deep Learning

The cocktail party problem comprises the challenging task of understanding a speech signal in a complex acoustic environment, where multiple speakers and background noise signals simultaneously interfere with the speech signal of interest.…

Sound · Computer Science 2018-12-05 Morten Kolbæk

Multi-scenario deep learning for multi-speaker source separation

Research in deep learning for multi-speaker source separation has received a boost in the last years. However, most studies are restricted to mixtures of a specific number of speakers, called a specific scenario. While some works included…

Machine Learning · Computer Science 2018-08-27 Jeroen Zegers , Hugo Van hamme

MIRNet: Learning multiple identities representations in overlapped speech

Many approaches can derive information about a single speaker's identity from the speech by learning to recognize consistent characteristics of acoustic parameters. However, it is challenging to determine identity information when there are…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-07 Hyewon Han , Soo-Whan Chung , Hong-Goo Kang

Audio-visual Speech Separation with Adversarially Disentangled Visual Representation

Speech separation aims to separate individual voice from an audio mixture of multiple simultaneous talkers. Although audio-only approaches achieve satisfactory performance, they build on a strategy to handle the predefined conditions,…

Sound · Computer Science 2020-12-01 Peng Zhang , Jiaming Xu , Jing shi , Yunzhe Hao , Bo Xu

Speaker-Conditional Chain Model for Speech Separation and Extraction

Speech separation has been extensively explored to tackle the cocktail party problem. However, these studies are still far from having enough generalization capabilities for real scenarios. In this work, we raise a common strategy named…

Audio and Speech Processing · Electrical Eng. & Systems 2020-06-26 Jing Shi , Jiaming Xu , Yusuke Fujita , Shinji Watanabe , Bo Xu

Speaker Characterization by means of Attention Pooling

State-of-the-art Deep Learning systems for speaker verification are commonly based on speaker embedding extractors. These architectures are usually composed of a feature extractor front-end together with a pooling layer to encode…

Audio and Speech Processing · Electrical Eng. & Systems 2024-05-08 Federico Costa , Miquel India , Javier Hernando

Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification

Recently, researchers have utilized neural network-based speaker embedding techniques in speaker-recognition tasks to identify speakers accurately. However, speaker-discriminative embeddings do not always represent speech features such as…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-24 Kwangje Baeg , Yeong-Gwan Kim , Young-Sub Han , Byoung-Ki Jeon

Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation

The problem of speech separation, also known as the cocktail party problem, refers to the task of isolating a single speech signal from a mixture of speech signals. Previous work on source separation derived an upper bound for the source…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-27 Shahar Lutati , Eliya Nachmani , Lior Wolf

Identify Speakers in Cocktail Parties with End-to-End Attention

In scenarios where multiple speakers talk at the same time, it is important to be able to identify the talkers accurately. This paper presents an end-to-end system that integrates speech source extraction and speaker identification, and…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-11 Junzhe Zhu , Mark Hasegawa-Johnson , Leda Sari

Investigation into Target Speaking Rate Adaptation for Voice Conversion

Disentangling speaker and content attributes of a speech signal into separate latent representations followed by decoding the content with an exchanged speaker representation is a popular approach for voice conversion, which can be trained…

Audio and Speech Processing · Electrical Eng. & Systems 2022-09-07 Michael Kuhlmann , Fritz Seebauer , Janek Ebbers , Petra Wagner , Reinhold Haeb-Umbach

Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches

Recently, end-to-end speaker extraction has attracted increasing attention and shown promising results. However, its performance is often inferior to that of a blind source separation (BSS) counterpart with a similar network architecture,…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-05 Zifeng Zhao , Dongchao Yang , Rongzhi Gu , Haoran Zhang , Yuexian Zou