Related papers: Diffusion-based Frameworks for Unsupervised Speech…

Diffusion-based Unsupervised Audio-visual Speech Enhancement

This paper proposes a new unsupervised audio-visual speech enhancement (AVSE) approach that combines a diffusion-based audio-visual speech generative model with a non-negative matrix factorization (NMF) noise model. First, the diffusion…

Sound · Computer Science 2025-01-16 Jean-Eudes Ayilo , Mostafa Sadeghi , Romain Serizel , Xavier Alameda-Pineda

Unsupervised speech enhancement with diffusion-based generative models

Recently, conditional score-based diffusion models have gained significant attention in the field of supervised speech enhancement, yielding state-of-the-art performance. However, these methods may face challenges when generalising to…

Computer Vision and Pattern Recognition · Computer Science 2023-09-20 Berné Nortier , Mostafa Sadeghi , Romain Serizel

NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement

The goal of speech enhancement (SE) is to eliminate the background interference from the noisy speech signal. Generative models such as diffusion models (DM) have been applied to the task of SE because of better generalization in unseen…

Sound · Computer Science 2023-09-06 Wen Wang , Dongchao Yang , Qichen Ye , Bowen Cao , Yuexian Zou

Diffusion-based speech enhancement with a weighted generative-supervised learning loss

Diffusion-based generative models have recently gained attention in speech enhancement (SE), providing an alternative to conventional supervised methods. These models transform clean speech training samples into Gaussian noise centered at…

Computer Vision and Pattern Recognition · Computer Science 2023-09-20 Jean-Eudes Ayilo , Mostafa Sadeghi , Romain Serizel

GDiffuSE: Diffusion-based speech enhancement with noise model guidance

This paper introduces a novel speech enhancement (SE) approach based on a denoising diffusion probabilistic model (DDPM), termed Guided diffusion for speech enhancement (GDiffuSE). In contrast to conventional methods that directly map noisy…

Sound · Computer Science 2026-03-03 Efrayim Yanir , David Burshtein , Sharon Gannot

Noise-aware Speech Enhancement using Diffusion Probabilistic Model

With recent advances of diffusion model, generative speech enhancement (SE) has attracted a surge of research interest due to its great potential for unseen testing noises. However, existing efforts mainly focus on inherent properties of…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-05 Yuchen Hu , Chen Chen , Ruizhe Li , Qiushi Zhu , Eng Siong Chng

ProSE: Diffusion Priors for Speech Enhancement

Speech enhancement (SE) is the foundational task of enhancing the clarity and quality of speech in the presence of non-stationary additive noise. While deterministic deep learning models have been commonly employed for SE, recent research…

Audio and Speech Processing · Electrical Eng. & Systems 2025-03-11 Sonal Kumar , Sreyan Ghosh , Utkarsh Tyagi , Anton Jeran Ratnarajah , Chandra Kiran Reddy Evuru , Ramani Duraiswami , Dinesh Manocha

Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement

We explore unsupervised speech enhancement using diffusion models as expressive generative priors for clean speech. Existing approaches guide the reverse diffusion process using noisy speech through an approximate, noise-perturbed…

Sound · Computer Science 2025-07-04 Mostafa Sadeghi , Jean-Eudes Ayilo , Romain Serizel , Xavier Alameda-Pineda

Speech Enhancement based on cascaded two flows

Speech enhancement (SE) based on diffusion probabilistic models has exhibited impressive performance, while requiring a relatively high number of function evaluations (NFE). Recently, SE based on flow matching has been proposed, which…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-20 Seonggyu Lee , Sein Cheong , Sangwook Han , Kihyuk Kim , Jong Won Shin

A Study on Speech Enhancement Based on Diffusion Probabilistic Model

Diffusion probabilistic models have demonstrated an outstanding capability to model natural images and raw audio waveforms through a paired diffusion and reverse processes. The unique property of the reverse process (namely, eliminating…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-23 Yen-Ju Lu , Yu Tsao , Shinji Watanabe

Combining Deterministic Enhanced Conditions with Dual-Streaming Encoding for Diffusion-Based Speech Enhancement

Diffusion-based speech enhancement (SE) models need to incorporate correct prior knowledge as reliable conditions to generate accurate predictions. However, providing reliable conditions using noisy features is challenging. One solution is…

Sound · Computer Science 2025-10-08 Hao Shi , Xugang Lu , Kazuki Shimada , Tatsuya Kawahara

Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions

The paper introduces Diff-Filter, a multichannel speech enhancement approach based on the diffusion probabilistic model, for improving speaker verification performance under noisy and reverberant conditions. It also presents a new two-step…

Sound · Computer Science 2023-07-06 Sandipana Dowerah , Ajinkya Kulkarni , Romain Serizel , Denis Jouvet

Unsupervised speech enhancement with deep dynamical generative speech and noise models

This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model. We propose to replace the NMF…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-14 Xiaoyu Lin , Simon Leglaive , Laurent Girin , Xavier Alameda-Pineda

Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization

This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech. A standard approach to speech enhancement is to train a deep neural network…

Sound · Computer Science 2019-03-12 Yoshiaki Bando , Masato Mimura , Katsutoshi Itoyama , Kazuyoshi Yoshii , Tatsuya Kawahara

Speech Enhancement and Dereverberation with Diffusion-based Generative Models

In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-14 Julius Richter , Simon Welker , Jean-Marie Lemercier , Bunlong Lay , Timo Gerkmann

Audio-Visual Speech Enhancement with Score-Based Generative Models

This paper introduces an audio-visual speech enhancement system that leverages score-based generative models, also known as diffusion models, conditioned on visual information. In particular, we exploit audio-visual embeddings obtained from…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-05 Julius Richter , Simone Frintrop , Timo Gerkmann

FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching

Generative models have excelled in audio tasks using approaches such as language models, diffusion, and flow matching. However, existing generative approaches for speech enhancement (SE) face notable challenges: language model-based methods…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-28 Ziqian Wang , Zikai Liu , Xinfa Zhu , Yike Zhu , Mingshuai Liu , Jun Chen , Longshuai Xiao , Chao Weng , Lei Xie

Semi-supervised multichannel speech enhancement with variational autoencoders and non-negative matrix factorization

In this paper we address speaker-independent multichannel speech enhancement in unknown noisy environments. Our work is based on a well-established multichannel local Gaussian modeling framework. We propose to use a neural network for…

Sound · Computer Science 2019-05-01 Simon Leglaive , Laurent Girin , Radu Horaud

Single and Few-step Diffusion for Generative Speech Enhancement

Diffusion models have shown promising results in speech enhancement, using a task-adapted diffusion process for the conditional generation of clean speech given a noisy mixture. However, at test time, the neural network used for score…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-17 Bunlong Lay , Jean-Marie Lemercier , Julius Richter , Timo Gerkmann

Diffusion-based Speech Enhancement with Schr\"odinger Bridge and Symmetric Noise Schedule

Recently, diffusion-based generative models have demonstrated remarkable performance in speech enhancement tasks. However, these methods still encounter challenges, including the lack of structural information and poor performance in low…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-16 Siyi Wang , Siyi Liu , Andrew Harper , Paul Kendrick , Mathieu Salzmann , Milos Cernak