Related papers: Unsupervised speech enhancement with diffusion-bas…

Diffusion-based Frameworks for Unsupervised Speech Enhancement

This paper addresses unsupervised diffusion-based single-channel speech enhancement (SE). Prior work in this direction combines a score-based diffusion model trained on clean speech with a Gaussian noise model whose covariance is structured…

Sound · Computer Science 2026-05-26 Jean-Eudes Ayilo , Mostafa Sadeghi , Romain Serizel , Xavier Alameda-Pineda

Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement

We explore unsupervised speech enhancement using diffusion models as expressive generative priors for clean speech. Existing approaches guide the reverse diffusion process using noisy speech through an approximate, noise-perturbed…

Sound · Computer Science 2025-07-04 Mostafa Sadeghi , Jean-Eudes Ayilo , Romain Serizel , Xavier Alameda-Pineda

Diffusion-based speech enhancement with a weighted generative-supervised learning loss

Diffusion-based generative models have recently gained attention in speech enhancement (SE), providing an alternative to conventional supervised methods. These models transform clean speech training samples into Gaussian noise centered at…

Computer Vision and Pattern Recognition · Computer Science 2023-09-20 Jean-Eudes Ayilo , Mostafa Sadeghi , Romain Serizel

Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain

Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals. In this work, we extend these models to the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-08 Simon Welker , Julius Richter , Timo Gerkmann

Diffusion-based Unsupervised Audio-visual Speech Enhancement

This paper proposes a new unsupervised audio-visual speech enhancement (AVSE) approach that combines a diffusion-based audio-visual speech generative model with a non-negative matrix factorization (NMF) noise model. First, the diffusion…

Sound · Computer Science 2025-01-16 Jean-Eudes Ayilo , Mostafa Sadeghi , Romain Serizel , Xavier Alameda-Pineda

Audio-Visual Speech Enhancement with Score-Based Generative Models

This paper introduces an audio-visual speech enhancement system that leverages score-based generative models, also known as diffusion models, conditioned on visual information. In particular, we exploit audio-visual embeddings obtained from…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-05 Julius Richter , Simone Frintrop , Timo Gerkmann

Speech Enhancement and Dereverberation with Diffusion-based Generative Models

In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-14 Julius Richter , Simon Welker , Jean-Marie Lemercier , Bunlong Lay , Timo Gerkmann

Pre-training Feature Guided Diffusion Model for Speech Enhancement

Speech enhancement significantly improves the clarity and intelligibility of speech in noisy environments, improving communication and listening experiences. In this paper, we introduce a novel pretraining feature-guided diffusion model…

Sound · Computer Science 2024-06-13 Yiyuan Yang , Niki Trigoni , Andrew Markham

Single and Few-step Diffusion for Generative Speech Enhancement

Diffusion models have shown promising results in speech enhancement, using a task-adapted diffusion process for the conditional generation of clean speech given a noisy mixture. However, at test time, the neural network used for score…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-17 Bunlong Lay , Jean-Marie Lemercier , Julius Richter , Timo Gerkmann

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs. In this paper, we propose a Unified…

Sound · Computer Science 2023-10-03 Muqiao Yang , Chunlei Zhang , Yong Xu , Zhongweiyang Xu , Heming Wang , Bhiksha Raj , Dong Yu

Diffusion-Based Unsupervised Audio-Visual Speech Separation in Noisy Environments with Noise Prior

In this paper, we address the problem of single-microphone speech separation in the presence of ambient noise. We propose a generative unsupervised technique that directly models both clean speech and structured noise components, training…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-19 Yochai Yemini , Rami Ben-Ari , Sharon Gannot , Ethan Fetaya

Non-intrusive Speech Quality Assessment with Diffusion Models Trained on Clean Speech

Diffusion models have found great success in generating high quality, natural samples of speech, but their potential for density estimation for speech has so far remained largely unexplored. In this work, we leverage an unconditional…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-16 Danilo de Oliveira , Julius Richter , Jean-Marie Lemercier , Simon Welker , Timo Gerkmann

Unsupervised speech enhancement with deep dynamical generative speech and noise models

This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model. We propose to replace the NMF…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-14 Xiaoyu Lin , Simon Leglaive , Laurent Girin , Xavier Alameda-Pineda

Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement

Diffusion-based generative models have recently achieved remarkable results in speech and vocal enhancement due to their ability to model complex speech data distributions. While these models generalize well to unseen acoustic environments,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-23 Yudong Yang , Zhan Liu , Wenyi Yu , Guangzhi Sun , Qiuqiang Kong , Chao Zhang

Diffusion-Model-Assisted Supervised Learning of Generative Models for Density Estimation

We present a supervised learning framework of training generative models for density estimation. Generative models, including generative adversarial networks, normalizing flows, variational auto-encoders, are usually considered as…

Machine Learning · Computer Science 2023-10-24 Yanfang Liu , Minglei Yang , Zezhong Zhang , Feng Bao , Yanzhao Cao , Guannan Zhang

MDDM: A Multi-view Discriminative Enhanced Diffusion-based Model for Speech Enhancement

With the development of deep learning, speech enhancement has been greatly optimized in terms of speech quality. Previous methods typically focus on the discriminative supervised learning or generative modeling, which tends to introduce…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-31 Nan Xu , Zhaolong Huang , Xiaonan Zhi

Investigating the Design Space of Diffusion Models for Speech Enhancement

Diffusion models are a new class of generative models that have shown outstanding performance in image generation literature. As a consequence, studies have attempted to apply diffusion models to other tasks, such as speech enhancement. A…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-10 Philippe Gonzalez , Zheng-Hua Tan , Jan Østergaard , Jesper Jensen , Tommy Sonne Alstrøm , Tobias May

Investigating the Effects of Diffusion-based Conditional Generative Speech Models Used for Speech Enhancement on Dysarthric Speech

In this study, we aim to explore the effect of pre-trained conditional generative speech models for the first time on dysarthric speech due to Parkinson's disease recorded in an ideal/non-noisy condition. Considering one category of…

Audio and Speech Processing · Electrical Eng. & Systems 2024-12-19 Joanna Reszka , Parvaneh Janbakhshi , Tilak Purohit , Sadegh Mohammadi

Conditional Diffusion Probabilistic Model for Speech Enhancement

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-11 Yen-Ju Lu , Zhong-Qiu Wang , Shinji Watanabe , Alexander Richard , Cheng Yu , Yu Tsao

Self-supervised learning with diffusion-based multichannel speech enhancement for speaker verification under noisy conditions

The paper introduces Diff-Filter, a multichannel speech enhancement approach based on the diffusion probabilistic model, for improving speaker verification performance under noisy and reverberant conditions. It also presents a new two-step…

Sound · Computer Science 2023-07-06 Sandipana Dowerah , Ajinkya Kulkarni , Romain Serizel , Denis Jouvet