Related papers: Surrogate Source Model Learning for Determined Sou…

Self-Supervised Learning-Based Source Separation for Meeting Data

Source separation can improve automatic speech recognition (ASR) under multi-party meeting scenarios by extracting single-speaker signals from overlapped speech. Despite the success of self-supervised learning models in single-channel…

Audio and Speech Processing · Electrical Eng. & Systems 2023-04-04 Yuang Li , Xianrui Zheng , Philip C. Woodland

Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization

We propose a new algorithm for blind source separation (BSS) using independent vector analysis (IVA). This is an improvement over the popular auxiliary function based IVA (AuxIVA) with iterative projection (IP) or iterative source steering…

Signal Processing · Electrical Eng. & Systems 2021-05-20 Robin Scheibler

Independent Vector Analysis with Deep Neural Network Source Priors

This paper studies the density priors for independent vector analysis (IVA) with convolutive speech mixture separation as the exemplary application. Most existing source priors for IVA are too simplified to capture the fine structures of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-07 Xi-Lin Li

Robust Target Speaker Diarization and Separation via Augmented Speaker Embedding Sampling

Traditional speech separation and speaker diarization approaches rely on prior knowledge of target speakers or a predetermined number of participants in audio signals. To address these limitations, recent advances focus on developing…

Sound · Computer Science 2025-08-11 Md Asif Jalal , Luca Remaggi , Vasileios Moschopoulos , Thanasis Kotsiopoulos , Vandana Rajan , Karthikeyan Saravanan , Anastasis Drosou , Junho Heo , Hyuk Oh , Seokyeong Jeong

Semi-blind source separation using convolutive transfer function for nonlinear acoustic echo cancellation

The recently proposed semi-blind source separation (SBSS) method for nonlinear acoustic echo cancellation (NAEC) outperforms adaptive NAEC in attenuating the nonlinear acoustic echo. However, the multiplicative transfer function (MTF)…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-18 Guoliang Cheng , Lele Liao , Kai Chen , Yuxiang Hu , Changbao Zhu , Jing Lu

A Comparison and Combination of Unsupervised Blind Source Separation Techniques

Unsupervised blind source separation methods do not require a training phase and thus cannot suffer from a train-test mismatch, which is a common concern in neural network based source separation. The unsupervised techniques can be…

Sound · Computer Science 2021-06-11 Christoph Boeddeker , Frederik Rautenberg , Reinhold Haeb-Umbach

Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition

In this paper, we propose a novel auxiliary loss function for target-speaker automatic speech recognition (ASR). Our method automatically extracts and transcribes target speaker's utterances from a monaural mixture of multiple speakers…

Computation and Language · Computer Science 2019-06-27 Naoyuki Kanda , Shota Horiguchi , Ryoichi Takashima , Yusuke Fujita , Kenji Nagamatsu , Shinji Watanabe

Independence-based Joint Dereverberation and Separation with Neural Source Model

We propose an independence-based joint dereverberation and separation method with a neural source model. We introduce a neural network in the framework of time-decorrelation iterative source steering, which is an extension of independent…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-04 Kohei Saijo , Robin Scheibler

Self-Supervised Adaptive AV Fusion Module for Pre-Trained ASR Models

Automatic speech recognition (ASR) has reached a level of accuracy in recent years, that even outperforms humans in transcribing speech to text. Nevertheless, all current ASR approaches show a certain weakness against ambient noise. To…

Sound · Computer Science 2023-12-22 Christopher Simic , Tobias Bocklet

Accelerating Auxiliary Function-based Independent Vector Analysis

Independent Vector Analysis (IVA) is an effective approach for Blind Source Separation (BSS) of convolutive mixtures of audio signals. As a practical realization of an IVA-based BSS algorithm, the so-called AuxIVA update rules based on the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-22 Andreas Brendel , Walter Kellermann

On permutation invariant training for speech source separation

We study permutation invariant training (PIT), which targets at the permutation ambiguity problem for speaker independent source separation models. We extend two state-of-the-art PIT strategies. First, we look at the two-stage speaker…

Sound · Computer Science 2021-04-06 Xiaoyu Liu , Jordi Pons

Short-time deep-learning based source separation for speech enhancement in reverberant environments with beamforming

The source separation-based speech enhancement problem with multiple beamforming in reverberant indoor environments is addressed in this paper. We propose that more generic solutions should cope with time-varying dynamic scenarios with…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-05 Alejandro Díaz , Diego Pincheira , Rodrigo Mahu , Nestor Becerra Yoma

Unsupervised Audio Source Separation using Generative Priors

State-of-the-art under-determined audio source separation systems rely on supervised end-end training of carefully tailored neural network architectures operating either in the time or the spectral domain. However, these methods are…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-29 Vivek Narayanaswamy , Jayaraman J. Thiagarajan , Rushil Anirudh , Andreas Spanias

End-to-End Multi-speaker ASR with Independent Vector Analysis

We develop an end-to-end system for multi-channel, multi-speaker automatic speech recognition. We propose a frontend for joint source separation and dereverberation based on the independent vector analysis (IVA) paradigm. It uses the fast…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-04 Robin Scheibler , Wangyou Zhang , Xuankai Chang , Shinji Watanabe , Yanmin Qian

Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition

Unsupervised single-channel overlapped speech recognition is one of the hardest problems in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the art model-based approach, which applies a single neural…

Computation and Language · Computer Science 2017-12-27 Zhehuai Chen , Jasha Droppo , Jinyu Li , Wayne Xiong

Deep Learning based Multi-Source Localization with Source Splitting and its Effectiveness in Multi-Talker Speech Recognition

Multi-source localization is an important and challenging technique for multi-talker conversation analysis. This paper proposes a novel supervised learning method using deep neural networks to estimate the direction of arrival (DOA) of all…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-30 Aswin Shanmugam Subramanian , Chao Weng , Shinji Watanabe , Meng Yu , Dong Yu

Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation

Time-domain training criteria have proven to be very effective for the separation of single-channel non-reverberant speech mixtures. Likewise, mask-based beamforming has shown impressive performance in multi-channel reverberant speech…

Sound · Computer Science 2021-06-09 Christoph Boeddeker , Wangyou Zhang , Tomohiro Nakatani , Keisuke Kinoshita , Tsubasa Ochiai , Marc Delcroix , Naoyuki Kamo , Yanmin Qian , Reinhold Haeb-Umbach

DIVE: End-to-end Speech Diarization via Iterative Speaker Embedding

We introduce DIVE, an end-to-end speaker diarization algorithm. Our neural algorithm presents the diarization task as an iterative process: it repeatedly builds a representation for each speaker before predicting the voice activity of each…

Sound · Computer Science 2021-05-31 Neil Zeghidour , Olivier Teboul , David Grangier

Unsupervised Music Source Separation Using Differentiable Parametric Source Models

Supervised deep learning approaches to underdetermined audio source separation achieve state-of-the-art performance but require a dataset of mixtures along with their corresponding isolated source signals. Such datasets can be extremely…

Sound · Computer Science 2023-02-01 Kilian Schulze-Forster , Gaël Richard , Liam Kelley , Clement S. J. Doire , Roland Badeau

Transfer Learning for Robust Low-Resource Children's Speech ASR with Transformers and Source-Filter Warping

Automatic Speech Recognition (ASR) systems are known to exhibit difficulties when transcribing children's speech. This can mainly be attributed to the absence of large children's speech corpora to train robust ASR models and the resulting…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-22 Jenthe Thienpondt , Kris Demuynck