English
Related papers

Related papers: Surrogate Source Model Learning for Determined Sou…

200 papers

Source separation can improve automatic speech recognition (ASR) under multi-party meeting scenarios by extracting single-speaker signals from overlapped speech. Despite the success of self-supervised learning models in single-channel…

Audio and Speech Processing · Electrical Eng. & Systems 2023-04-04 Yuang Li , Xianrui Zheng , Philip C. Woodland

We propose a new algorithm for blind source separation (BSS) using independent vector analysis (IVA). This is an improvement over the popular auxiliary function based IVA (AuxIVA) with iterative projection (IP) or iterative source steering…

Signal Processing · Electrical Eng. & Systems 2021-05-20 Robin Scheibler

This paper studies the density priors for independent vector analysis (IVA) with convolutive speech mixture separation as the exemplary application. Most existing source priors for IVA are too simplified to capture the fine structures of…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-07 Xi-Lin Li

Traditional speech separation and speaker diarization approaches rely on prior knowledge of target speakers or a predetermined number of participants in audio signals. To address these limitations, recent advances focus on developing…

The recently proposed semi-blind source separation (SBSS) method for nonlinear acoustic echo cancellation (NAEC) outperforms adaptive NAEC in attenuating the nonlinear acoustic echo. However, the multiplicative transfer function (MTF)…

Audio and Speech Processing · Electrical Eng. & Systems 2023-01-18 Guoliang Cheng , Lele Liao , Kai Chen , Yuxiang Hu , Changbao Zhu , Jing Lu

Unsupervised blind source separation methods do not require a training phase and thus cannot suffer from a train-test mismatch, which is a common concern in neural network based source separation. The unsupervised techniques can be…

Sound · Computer Science 2021-06-11 Christoph Boeddeker , Frederik Rautenberg , Reinhold Haeb-Umbach

In this paper, we propose a novel auxiliary loss function for target-speaker automatic speech recognition (ASR). Our method automatically extracts and transcribes target speaker's utterances from a monaural mixture of multiple speakers…

Computation and Language · Computer Science 2019-06-27 Naoyuki Kanda , Shota Horiguchi , Ryoichi Takashima , Yusuke Fujita , Kenji Nagamatsu , Shinji Watanabe

We propose an independence-based joint dereverberation and separation method with a neural source model. We introduce a neural network in the framework of time-decorrelation iterative source steering, which is an extension of independent…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-04 Kohei Saijo , Robin Scheibler

Automatic speech recognition (ASR) has reached a level of accuracy in recent years, that even outperforms humans in transcribing speech to text. Nevertheless, all current ASR approaches show a certain weakness against ambient noise. To…

Sound · Computer Science 2023-12-22 Christopher Simic , Tobias Bocklet

Independent Vector Analysis (IVA) is an effective approach for Blind Source Separation (BSS) of convolutive mixtures of audio signals. As a practical realization of an IVA-based BSS algorithm, the so-called AuxIVA update rules based on the…

Audio and Speech Processing · Electrical Eng. & Systems 2020-09-22 Andreas Brendel , Walter Kellermann

We study permutation invariant training (PIT), which targets at the permutation ambiguity problem for speaker independent source separation models. We extend two state-of-the-art PIT strategies. First, we look at the two-stage speaker…

Sound · Computer Science 2021-04-06 Xiaoyu Liu , Jordi Pons

The source separation-based speech enhancement problem with multiple beamforming in reverberant indoor environments is addressed in this paper. We propose that more generic solutions should cope with time-varying dynamic scenarios with…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-05 Alejandro Díaz , Diego Pincheira , Rodrigo Mahu , Nestor Becerra Yoma

State-of-the-art under-determined audio source separation systems rely on supervised end-end training of carefully tailored neural network architectures operating either in the time or the spectral domain. However, these methods are…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-29 Vivek Narayanaswamy , Jayaraman J. Thiagarajan , Rushil Anirudh , Andreas Spanias

We develop an end-to-end system for multi-channel, multi-speaker automatic speech recognition. We propose a frontend for joint source separation and dereverberation based on the independent vector analysis (IVA) paradigm. It uses the fast…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-04 Robin Scheibler , Wangyou Zhang , Xuankai Chang , Shinji Watanabe , Yanmin Qian

Unsupervised single-channel overlapped speech recognition is one of the hardest problems in automatic speech recognition (ASR). Permutation invariant training (PIT) is a state of the art model-based approach, which applies a single neural…

Computation and Language · Computer Science 2017-12-27 Zhehuai Chen , Jasha Droppo , Jinyu Li , Wayne Xiong

Multi-source localization is an important and challenging technique for multi-talker conversation analysis. This paper proposes a novel supervised learning method using deep neural networks to estimate the direction of arrival (DOA) of all…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-30 Aswin Shanmugam Subramanian , Chao Weng , Shinji Watanabe , Meng Yu , Dong Yu

Time-domain training criteria have proven to be very effective for the separation of single-channel non-reverberant speech mixtures. Likewise, mask-based beamforming has shown impressive performance in multi-channel reverberant speech…

We introduce DIVE, an end-to-end speaker diarization algorithm. Our neural algorithm presents the diarization task as an iterative process: it repeatedly builds a representation for each speaker before predicting the voice activity of each…

Sound · Computer Science 2021-05-31 Neil Zeghidour , Olivier Teboul , David Grangier

Supervised deep learning approaches to underdetermined audio source separation achieve state-of-the-art performance but require a dataset of mixtures along with their corresponding isolated source signals. Such datasets can be extremely…

Automatic Speech Recognition (ASR) systems are known to exhibit difficulties when transcribing children's speech. This can mainly be attributed to the absence of large children's speech corpora to train robust ASR models and the resulting…

Audio and Speech Processing · Electrical Eng. & Systems 2022-06-22 Jenthe Thienpondt , Kris Demuynck
‹ Prev 1 2 3 10 Next ›