Related papers: Cold Diffusion for Speech Enhancement

A Study on Speech Enhancement Based on Diffusion Probabilistic Model

Diffusion probabilistic models have demonstrated an outstanding capability to model natural images and raw audio waveforms through a paired diffusion and reverse processes. The unique property of the reverse process (namely, eliminating…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-23 Yen-Ju Lu , Yu Tsao , Shinji Watanabe

Conditional Diffusion Probabilistic Model for Speech Enhancement

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-11 Yen-Ju Lu , Zhong-Qiu Wang , Shinji Watanabe , Alexander Richard , Cheng Yu , Yu Tsao

Speech Enhancement and Dereverberation with Diffusion-based Generative Models

In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-14 Julius Richter , Simon Welker , Jean-Marie Lemercier , Bunlong Lay , Timo Gerkmann

Diffusion Models for Audio Restoration

With the development of audio playback devices and fast data transmission, the demand for high sound quality is rising for both entertainment and communications. In this quest for better sound quality, challenges emerge from distortions and…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-12 Jean-Marie Lemercier , Julius Richter , Simon Welker , Eloi Moliner , Vesa Välimäki , Timo Gerkmann

Diffusion-based Signal Refiner for Speech Enhancement and Separation

Although recent speech processing technologies have achieved significant improvements in objective metrics, there still remains a gap in human perceptual quality. This paper proposes Diffiner, a novel solution that utilizes the powerful…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-11 Masato Hirano , Ryosuke Sawata , Naoki Murata , Shusuke Takahashi , Yuki Mitsufuji

GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement

Speech enhancement is designed to enhance the intelligibility and quality of speech across diverse noise conditions. Recently, diffusion model has gained lots of attention in speech enhancement area, achieving competitive results. Current…

Sound · Computer Science 2025-01-23 Chengzhong Wang , Jianjun Gu , Dingding Yao , Junfeng Li , Yonghong Yan

Single and Few-step Diffusion for Generative Speech Enhancement

Diffusion models have shown promising results in speech enhancement, using a task-adapted diffusion process for the conditional generation of clean speech given a noisy mixture. However, at test time, the neural network used for score…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-17 Bunlong Lay , Jean-Marie Lemercier , Julius Richter , Timo Gerkmann

On the Application of Diffusion Models for Simultaneous Denoising and Dereverberation

Diffusion models have been shown to achieve natural-sounding enhancement of speech degraded by noise or reverberation. However, their simultaneous denoising and dereverberation capability has so far not been studied much, although this is…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-27 Adrian Meise , Tobias Cord-Landwehr , Reinhold Haeb-Umbach

TransFusion: Transcribing Speech with Multinomial Diffusion

Diffusion models have shown exceptional scaling properties in the image synthesis domain, and initial attempts have shown similar benefits for applying diffusion to unconditional text synthesis. Denoising diffusion models attempt to…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-17 Matthew Baas , Kevin Eloff , Herman Kamper

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-13 Jean-Marie Lemercier , Julius Richter , Simon Welker , Timo Gerkmann

Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency

Diffusion models are a class of generative models that have been recently used for speech enhancement with remarkable success but are computationally expensive at inference time. Therefore, these models are impractical for processing…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-15 Bunlong Lay , Rostislav Makarov , Timo Gerkmann

Investigating the Design Space of Diffusion Models for Speech Enhancement

Diffusion models are a new class of generative models that have shown outstanding performance in image generation literature. As a consequence, studies have attempted to apply diffusion models to other tasks, such as speech enhancement. A…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-10 Philippe Gonzalez , Zheng-Hua Tan , Jan Østergaard , Jesper Jensen , Tommy Sonne Alstrøm , Tobias May

MDDM: A Multi-view Discriminative Enhanced Diffusion-based Model for Speech Enhancement

With the development of deep learning, speech enhancement has been greatly optimized in terms of speech quality. Previous methods typically focus on the discriminative supervised learning or generative modeling, which tends to introduce…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-31 Nan Xu , Zhaolong Huang , Xiaonan Zhi

Diffusion Posterior Proximal Sampling for Image Restoration

Diffusion models have demonstrated remarkable efficacy in generating high-quality samples. Existing diffusion-based image restoration algorithms exploit pre-trained diffusion models to leverage data priors, yet they still preserve elements…

Image and Video Processing · Electrical Eng. & Systems 2024-08-07 Hongjie Wu , Linchao He , Mingqin Zhang , Dongdong Chen , Kunming Luo , Mengting Luo , Ji-Zhe Zhou , Hu Chen , Jiancheng Lv

Speech Signal Improvement Using Causal Generative Diffusion Models

In this paper, we present a causal speech signal improvement system that is designed to handle different types of distortions. The method is based on a generative diffusion model which has been shown to work well in scenarios with missing…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-16 Julius Richter , Simon Welker , Jean-Marie Lemercier , Bunlong Lay , Tal Peer , Timo Gerkmann

Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement

We explore unsupervised speech enhancement using diffusion models as expressive generative priors for clean speech. Existing approaches guide the reverse diffusion process using noisy speech through an approximate, noise-perturbed…

Sound · Computer Science 2025-07-04 Mostafa Sadeghi , Jean-Eudes Ayilo , Romain Serizel , Xavier Alameda-Pineda

SEED: Speaker Embedding Enhancement Diffusion Model

A primary challenge when deploying speaker recognition systems in real-world applications is performance degradation caused by environmental mismatch. We propose a diffusion-based method that takes speaker embeddings extracted from a…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-23 KiHyun Nam , Jungwoo Heo , Jee-weon Jung , Gangin Park , Chaeyoung Jung , Ha-Jin Yu , Joon Son Chung

DDTSE: Discriminative Diffusion Model for Target Speech Extraction

Diffusion models have gained attention in speech enhancement tasks, providing an alternative to conventional discriminative methods. However, research on target speech extraction under multi-speaker noisy conditions remains relatively…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-08 Leying Zhang , Yao Qian , Linfeng Yu , Heming Wang , Hemin Yang , Long Zhou , Shujie Liu , Yanmin Qian

Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement

Diffusion-based generative models have recently achieved remarkable results in speech and vocal enhancement due to their ability to model complex speech data distributions. While these models generalize well to unseen acoustic environments,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-23 Yudong Yang , Zhan Liu , Wenyi Yu , Guangzhi Sun , Qiuqiang Kong , Chao Zhang

Absorbing Discrete Diffusion for Speech Enhancement

Inspired by recent developments in neural speech coding and diffusion-based language modeling, we tackle speech enhancement by modeling the conditional distribution of clean speech codes given noisy speech codes using absorbing discrete…

Sound · Computer Science 2026-02-27 Philippe Gonzalez