Related papers: Noise-aware Speech Enhancement using Diffusion Pro…

NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement

The goal of speech enhancement (SE) is to eliminate the background interference from the noisy speech signal. Generative models such as diffusion models (DM) have been applied to the task of SE because of better generalization in unseen…

Sound · Computer Science 2023-09-06 Wen Wang , Dongchao Yang , Qichen Ye , Bowen Cao , Yuexian Zou

A Study on Speech Enhancement Based on Diffusion Probabilistic Model

Diffusion probabilistic models have demonstrated an outstanding capability to model natural images and raw audio waveforms through a paired diffusion and reverse processes. The unique property of the reverse process (namely, eliminating…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-23 Yen-Ju Lu , Yu Tsao , Shinji Watanabe

Diffusion-based speech enhancement with a weighted generative-supervised learning loss

Diffusion-based generative models have recently gained attention in speech enhancement (SE), providing an alternative to conventional supervised methods. These models transform clean speech training samples into Gaussian noise centered at…

Computer Vision and Pattern Recognition · Computer Science 2023-09-20 Jean-Eudes Ayilo , Mostafa Sadeghi , Romain Serizel

GDiffuSE: Diffusion-based speech enhancement with noise model guidance

This paper introduces a novel speech enhancement (SE) approach based on a denoising diffusion probabilistic model (DDPM), termed Guided diffusion for speech enhancement (GDiffuSE). In contrast to conventional methods that directly map noisy…

Sound · Computer Science 2026-03-03 Efrayim Yanir , David Burshtein , Sharon Gannot

Conditional Diffusion Probabilistic Model for Speech Enhancement

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-11 Yen-Ju Lu , Zhong-Qiu Wang , Shinji Watanabe , Alexander Richard , Cheng Yu , Yu Tsao

Diffusion-based Unsupervised Audio-visual Speech Enhancement

This paper proposes a new unsupervised audio-visual speech enhancement (AVSE) approach that combines a diffusion-based audio-visual speech generative model with a non-negative matrix factorization (NMF) noise model. First, the diffusion…

Sound · Computer Science 2025-01-16 Jean-Eudes Ayilo , Mostafa Sadeghi , Romain Serizel , Xavier Alameda-Pineda

Diffusion-based Frameworks for Unsupervised Speech Enhancement

This paper addresses unsupervised diffusion-based single-channel speech enhancement (SE). Prior work in this direction combines a score-based diffusion model trained on clean speech with a Gaussian noise model whose covariance is structured…

Sound · Computer Science 2026-05-26 Jean-Eudes Ayilo , Mostafa Sadeghi , Romain Serizel , Xavier Alameda-Pineda

Inference and Denoise: Causal Inference-based Neural Speech Enhancement

This study addresses the speech enhancement (SE) task within the causal inference paradigm by modeling the noise presence as an intervention. Based on the potential outcome framework, the proposed causal inference-based speech enhancement…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-03 Tsun-An Hsieh , Chao-Han Huck Yang , Pin-Yu Chen , Sabato Marco Siniscalchi , Yu Tsao

Speech Enhancement and Dereverberation with Diffusion-based Generative Models

In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-14 Julius Richter , Simon Welker , Jean-Marie Lemercier , Bunlong Lay , Timo Gerkmann

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement

Although deep neural network (DNN)-based speech enhancement (SE) methods outperform the previous non-DNN-based ones, they often degrade the perceptual quality of generated outputs. To tackle this problem, we introduce a DNN-based generative…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-31 Ryosuke Sawata , Naoki Murata , Yuhta Takida , Toshimitsu Uesaka , Takashi Shibuya , Shusuke Takahashi , Yuki Mitsufuji

Speech Enhancement based on cascaded two flows

Speech enhancement (SE) based on diffusion probabilistic models has exhibited impressive performance, while requiring a relatively high number of function evaluations (NFE). Recently, SE based on flow matching has been proposed, which…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-20 Seonggyu Lee , Sein Cheong , Sangwook Han , Kihyuk Kim , Jong Won Shin

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

Diffusion-based generative speech enhancement (SE) has recently received attention, but reverse diffusion remains time-consuming. One solution is to initialize the reverse diffusion process with enhanced features estimated by a predictive…

Sound · Computer Science 2024-02-29 Hao Shi , Kazuki Shimada , Masato Hirano , Takashi Shibuya , Yuichiro Koyama , Zhi Zhong , Shusuke Takahashi , Tatsuya Kawahara , Yuki Mitsufuji

Combining Deterministic Enhanced Conditions with Dual-Streaming Encoding for Diffusion-Based Speech Enhancement

Diffusion-based speech enhancement (SE) models need to incorporate correct prior knowledge as reliable conditions to generate accurate predictions. However, providing reliable conditions using noisy features is challenging. One solution is…

Sound · Computer Science 2025-10-08 Hao Shi , Xugang Lu , Kazuki Shimada , Tatsuya Kawahara

Target Speech Extraction with Conditional Diffusion Model

Diffusion model-based speech enhancement has received increased attention since it can generate very natural enhanced signals and generalizes well to unseen conditions. Diffusion models have been explored for several sub-tasks of speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-08-21 Naoyuki Kamo , Marc Delcroix , Tomohiro Nakatani

Pre-training Feature Guided Diffusion Model for Speech Enhancement

Speech enhancement significantly improves the clarity and intelligibility of speech in noisy environments, improving communication and listening experiences. In this paper, we introduce a novel pretraining feature-guided diffusion model…

Sound · Computer Science 2024-06-13 Yiyuan Yang , Niki Trigoni , Andrew Markham

Diffusion-based Speech Enhancement with Schr\"odinger Bridge and Symmetric Noise Schedule

Recently, diffusion-based generative models have demonstrated remarkable performance in speech enhancement tasks. However, these methods still encounter challenges, including the lack of structural information and poor performance in low…

Audio and Speech Processing · Electrical Eng. & Systems 2024-09-16 Siyi Wang , Siyi Liu , Andrew Harper , Paul Kendrick , Mathieu Salzmann , Milos Cernak

Metric-oriented Speech Enhancement using Diffusion Probabilistic Model

Deep neural network based speech enhancement technique focuses on learning a noisy-to-clean transformation supervised by paired training data. However, the task-specific evaluation metric (e.g., PESQ) is usually non-differentiable and can…

Sound · Computer Science 2023-02-24 Chen Chen , Yuchen Hu , Weiwei Weng , Eng Siong Chng

FlowSE: Efficient and High-Quality Speech Enhancement via Flow Matching

Generative models have excelled in audio tasks using approaches such as language models, diffusion, and flow matching. However, existing generative approaches for speech enhancement (SE) face notable challenges: language model-based methods…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-28 Ziqian Wang , Zikai Liu , Xinfa Zhu , Yike Zhu , Mingshuai Liu , Jun Chen , Longshuai Xiao , Chao Weng , Lei Xie

ProSE: Diffusion Priors for Speech Enhancement

Speech enhancement (SE) is the foundational task of enhancing the clarity and quality of speech in the presence of non-stationary additive noise. While deterministic deep learning models have been commonly employed for SE, recent research…

Audio and Speech Processing · Electrical Eng. & Systems 2025-03-11 Sonal Kumar , Sreyan Ghosh , Utkarsh Tyagi , Anton Jeran Ratnarajah , Chandra Kiran Reddy Evuru , Ramani Duraiswami , Dinesh Manocha

GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement

Speech enhancement is designed to enhance the intelligibility and quality of speech across diverse noise conditions. Recently, diffusion model has gained lots of attention in speech enhancement area, achieving competitive results. Current…

Sound · Computer Science 2025-01-23 Chengzhong Wang , Jianjun Gu , Dingding Yao , Junfeng Li , Yonghong Yan