English
Related papers

Related papers: Conditional Diffusion Model for Target Speaker Ext…

200 papers

We propose DiffSep, a new single channel source separation method based on score-matching of a stochastic differential equation (SDE). We craft a tailored continuous time diffusion-mixing process starting from the separated sources and…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-03 Robin Scheibler , Youna Ji , Soo-Whan Chung , Jaeuk Byun , Soyeon Choe , Min-Seok Choi

Diffusion model-based speech enhancement has received increased attention since it can generate very natural enhanced signals and generalizes well to unseen conditions. Diffusion models have been explored for several sub-tasks of speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-21 Naoyuki Kamo , Marc Delcroix , Tomohiro Nakatani

Generative models have attracted considerable attention for speech separation tasks, and among these, diffusion-based methods are being explored. Despite the notable success of diffusion techniques in generation tasks, their adaptation to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-28 Jinwei Dong , Xinsheng Wang , Qirong Mao

Informed speaker extraction aims to extract a target speech signal from a mixture of sources given prior knowledge about the desired speaker. Recent deep learning-based methods leverage a speaker discriminative model that maps a reference…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-17 Mohamed Elminshawi , Wolfgang Mack , Emanuël A. P. Habets

In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-14 Julius Richter , Simon Welker , Jean-Marie Lemercier , Bunlong Lay , Timo Gerkmann

Speaker extraction aims to mimic humans' selective auditory attention by extracting a target speaker's voice from a multi-talker environment. It is common to perform the extraction in frequency-domain, and reconstruct the time-domain signal…

Audio and Speech Processing · Electrical Eng. & Systems 2020-04-20 Chenglin Xu , Wei Rao , Eng Siong Chng , Haizhou Li

Diffusion models have gained attention in speech enhancement tasks, providing an alternative to conventional discriminative methods. However, research on target speech extraction under multi-speaker noisy conditions remains relatively…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-08 Leying Zhang , Yao Qian , Linfeng Yu , Heming Wang , Hemin Yang , Long Zhou , Shujie Liu , Yanmin Qian

Recently, conditional score-based diffusion models have gained significant attention in the field of supervised speech enhancement, yielding state-of-the-art performance. However, these methods may face challenges when generalising to…

Computer Vision and Pattern Recognition · Computer Science 2023-09-20 Berné Nortier , Mostafa Sadeghi , Romain Serizel

Speaker extraction aims to extract the target speech signal from a multi-talker environment given a target speaker's reference speech. We recently proposed a time-domain solution, SpEx, that avoids the phase estimation in frequency-domain…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-19 Meng Ge , Chenglin Xu , Longbiao Wang , Eng Siong Chng , Jianwu Dang , Haizhou Li

Speaker extraction aims to extract the target speaker's voice from a multi-talker speech mixture given an auxiliary reference utterance. Recent studies show that speaker extraction benefits from the location or direction of the target…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-22 Meng Ge , Chenglin Xu , Longbiao Wang , Eng Siong Chng , Jianwu Dang , Haizhou Li

We study variance reduction for score estimation and diffusion-based sampling in settings where the clean (target) score is available or can be approximated. Starting from the Target Score Identity (TSI), which expresses the noisy marginal…

Machine Learning · Statistics 2026-01-26 Alois Duston , Tan Bui-Thanh

This paper introduces an audio-visual speech enhancement system that leverages score-based generative models, also known as diffusion models, conditioned on visual information. In particular, we exploit audio-visual embeddings obtained from…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-05 Julius Richter , Simone Frintrop , Timo Gerkmann

Target Speaker Extraction (TSE) uses a reference cue to extract the target speech from a mixture. In TSE systems relying on audio cues, the speaker embedding from the enrolled speech is crucial to performance. However, these embeddings may…

Sound · Computer Science 2025-08-12 Shu Wu , Anbin Qi , Yanzhang Xie , Xiang Xie

Speech separation is a fundamental task in audio processing, typically addressed with fully supervised systems trained on paired mixtures. While effective, such systems typically rely on synthetic data pipelines, which may not reflect…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-30 Runwu Shi , Kai Li , Chang Li , Jiang Wang , Sihan Tan , Kazuhiro Nakadai

Separating the individual elements in a musical mixture is an essential process for music analysis and practice. While this is generally addressed using neural networks optimized to mask or transform the time-frequency representation of a…

Sound · Computer Science 2025-11-27 Genís Plaja-Roglans , Yun-Ning Hung , Xavier Serra , Igor Pereira

A primary challenge when deploying speaker recognition systems in real-world applications is performance degradation caused by environmental mismatch. We propose a diffusion-based method that takes speaker embeddings extracted from a…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-23 KiHyun Nam , Jungwoo Heo , Jee-weon Jung , Gangin Park , Chaeyoung Jung , Ha-Jin Yu , Joon Son Chung

Diffusion-based generative models have recently achieved remarkable results in speech and vocal enhancement due to their ability to model complex speech data distributions. While these models generalize well to unseen acoustic environments,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-23 Yudong Yang , Zhan Liu , Wenyi Yu , Guangzhi Sun , Qiuqiang Kong , Chao Zhang

Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals. In this work, we extend these models to the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-08 Simon Welker , Julius Richter , Timo Gerkmann

Diffusion models have emerged as powerful deep generative techniques, producing high-quality and diverse samples in applications in various domains including audio. While existing reviews provide overviews, there remains limited in-depth…

Sound · Computer Science 2026-01-16 Ge Zhu , Yutong Wen , Zhiyao Duan

Score-based modeling through stochastic differential equations (SDEs) has provided a new perspective on diffusion models, and demonstrated superior performance on continuous data. However, the gradient of the log-likelihood function, i.e.,…

Machine Learning · Computer Science 2023-03-07 Haoran Sun , Lijun Yu , Bo Dai , Dale Schuurmans , Hanjun Dai
‹ Prev 1 2 3 10 Next ›