Related papers: Diffusion-based Generative Speech Source Separatio…

EDSep: An Effective Diffusion-Based Method for Speech Source Separation

Generative models have attracted considerable attention for speech separation tasks, and among these, diffusion-based methods are being explored. Despite the notable success of diffusion techniques in generation tasks, their adaptation to…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-28 Jinwei Dong , Xinsheng Wang , Qirong Mao

Conditional Diffusion Model for Target Speaker Extraction

We propose DiffSpEx, a generative target speaker extraction method based on score-based generative modelling through stochastic differential equations. DiffSpEx deploys a continuous-time stochastic diffusion process in the complex…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-10 Theodor Nguyen , Guangzhi Sun , Xianrui Zheng , Chao Zhang , Philip C Woodland

A Tutorial on Diffusion Theory: From Differential Equations to Diffusion Models

Diffusion models have emerged as a dominant framework for generative modeling, but their mathematical foundations are often presented separately through diffusion probabilistic models, score-based modeling, stochastic differential…

Machine Learning · Computer Science 2026-05-29 Jiayi Fu , Yuxia Wang

Noise-robust Speech Separation with Fast Generative Correction

Speech separation, the task of isolating multiple speech sources from a mixed audio signal, remains challenging in noisy environments. In this paper, we propose a generative correction method to enhance the output of a discriminative…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-12 Helin Wang , Jesus Villalba , Laureano Moro-Velazquez , Jiarui Hai , Thomas Thebaud , Najim Dehak

Score-based Source Separation with Applications to Digital Communication Signals

We propose a new method for separating superimposed sources using diffusion-based generative models. Our method relies only on separately trained statistical priors of independent sources to establish a new objective function guided by…

Machine Learning · Computer Science 2024-01-18 Tejas Jayashankar , Gary C. F. Lee , Alejandro Lancho , Amir Weiss , Yury Polyanskiy , Gregory W. Wornell

Speech Enhancement and Dereverberation with Diffusion-based Generative Models

In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-14 Julius Richter , Simon Welker , Jean-Marie Lemercier , Bunlong Lay , Timo Gerkmann

A Variational Perspective on Diffusion-Based Generative Models and Score Matching

Discrete-time diffusion-based generative models and score matching methods have shown promising results in modeling high-dimensional image data. Recently, Song et al. (2021) show that diffusion processes that transform data into noise can…

Machine Learning · Computer Science 2021-10-01 Chin-Wei Huang , Jae Hyun Lim , Aaron Courville

Single and Few-step Diffusion for Generative Speech Enhancement

Diffusion models have shown promising results in speech enhancement, using a task-adapted diffusion process for the conditional generation of clean speech given a noisy mixture. However, at test time, the neural network used for score…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-17 Bunlong Lay , Jean-Marie Lemercier , Julius Richter , Timo Gerkmann

Diffusion-Based Unsupervised Audio-Visual Speech Separation in Noisy Environments with Noise Prior

In this paper, we address the problem of single-microphone speech separation in the presence of ambient noise. We propose a generative unsupervised technique that directly models both clean speech and structured noise components, training…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-19 Yochai Yemini , Rami Ben-Ari , Sharon Gannot , Ethan Fetaya

Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement

Diffusion-based generative models (DGMs) have recently attracted attention in speech enhancement research (SE) as previous works showed a remarkable generalization capability. However, DGMs are also computationally intensive, as they…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-21 Chenda Li , Samuele Cornell , Shinji Watanabe , Yanmin Qian

Unsupervised Single-Channel Speech Separation with a Diffusion Prior under Speaker-Embedding Guidance

Speech separation is a fundamental task in audio processing, typically addressed with fully supervised systems trained on paired mixtures. While effective, such systems typically rely on synthetic data pipelines, which may not reflect…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-30 Runwu Shi , Kai Li , Chang Li , Jiang Wang , Sihan Tan , Kazuhiro Nakadai

Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations

Generating graph-structured data requires learning the underlying distribution of graphs. Yet, this is a challenging problem, and the previous graph generative methods either fail to capture the permutation-invariance property of graphs or…

Machine Learning · Computer Science 2022-06-16 Jaehyeong Jo , Seul Lee , Sung Ju Hwang

Score-based Generative Modeling Through Backward Stochastic Differential Equations: Inversion and Generation

The proposed BSDE-based diffusion model represents a novel approach to diffusion modeling, which extends the application of stochastic differential equations (SDEs) in machine learning. Unlike traditional SDE-based diffusion models, our…

Machine Learning · Computer Science 2023-04-27 Zihao Wang

Generating Separated Singing Vocals Using a Diffusion Model Conditioned on Music Mixtures

Separating the individual elements in a musical mixture is an essential process for music analysis and practice. While this is generally addressed using neural networks optimized to mask or transform the time-frequency representation of a…

Sound · Computer Science 2025-11-27 Genís Plaja-Roglans , Yun-Ning Hung , Xavier Serra , Igor Pereira

Score-Based Generative Modeling through Stochastic Differential Equations

Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting…

Machine Learning · Computer Science 2021-02-11 Yang Song , Jascha Sohl-Dickstein , Diederik P. Kingma , Abhishek Kumar , Stefano Ermon , Ben Poole

User-guided Generative Source Separation

Music source separation (MSS) aims to extract individual instrument sources from their mixture. While most existing methods focus on the widely adopted four-stem separation setup (vocals, bass, drums, and other instruments), this approach…

Sound · Computer Science 2025-08-06 Yutong Wen , Minje Kim , Paris Smaragdis

Multi-Source Diffusion Models for Simultaneous Music Generation and Separation

In this work, we define a diffusion-based generative model capable of both music synthesis and source separation by learning the score of the joint probability density of sources sharing a context. Alongside the classic total inference…

Sound · Computer Science 2024-03-19 Giorgio Mariani , Irene Tallini , Emilian Postolache , Michele Mancusi , Luca Cosmo , Emanuele Rodolà

Efficient and Fast Generative-Based Singing Voice Separation using a Latent Diffusion Model

Extracting individual elements from music mixtures is a valuable tool for music production and practice. While neural networks optimized to mask or transform mixture spectrograms into the individual source(s) have been the leading approach,…

Sound · Computer Science 2025-11-26 Genís Plaja-Roglans , Yun-Ning Hung , Xavier Serra , Igor Pereira

Diffusion-based speech enhancement with a weighted generative-supervised learning loss

Diffusion-based generative models have recently gained attention in speech enhancement (SE), providing an alternative to conventional supervised methods. These models transform clean speech training samples into Gaussian noise centered at…

Computer Vision and Pattern Recognition · Computer Science 2023-09-20 Jean-Eudes Ayilo , Mostafa Sadeghi , Romain Serizel

Score-based Continuous-time Discrete Diffusion Models

Score-based modeling through stochastic differential equations (SDEs) has provided a new perspective on diffusion models, and demonstrated superior performance on continuous data. However, the gradient of the log-likelihood function, i.e.,…

Machine Learning · Computer Science 2023-03-07 Haoran Sun , Lijun Yu , Bo Dai , Dale Schuurmans , Hanjun Dai