Related papers: DiffPhase: Generative Diffusion-based STFT Phase R…

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-13 Jean-Marie Lemercier , Julius Richter , Simon Welker , Timo Gerkmann

Single and Few-step Diffusion for Generative Speech Enhancement

Diffusion models have shown promising results in speech enhancement, using a task-adapted diffusion process for the conditional generation of clean speech given a noisy mixture. However, at test time, the neural network used for score…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-17 Bunlong Lay , Jean-Marie Lemercier , Julius Richter , Timo Gerkmann

Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration

Diffusion-based generative models have had a high impact on the computer vision and speech processing communities these past years. Besides data generation tasks, they have also been employed for data restoration tasks like speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-17 Jean-Marie Lemercier , Julius Richter , Simon Welker , Timo Gerkmann

Speech Enhancement and Dereverberation with Diffusion-based Generative Models

In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-14 Julius Richter , Simon Welker , Jean-Marie Lemercier , Bunlong Lay , Timo Gerkmann

SRTNet: Time Domain Speech Enhancement Via Stochastic Refinement

Diffusion model, as a new generative model which is very popular in image generation and audio synthesis, is rarely used in speech enhancement. In this paper, we use the diffusion model as a module for stochastic refinement. We propose…

Sound · Computer Science 2022-11-01 Zhibin Qiu , Mengfan Fu , Yinfeng Yu , LiLi Yin , Fuchun Sun , Hao Huang

A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI

Generative AI has demonstrated impressive performance in various fields, among which speech synthesis is an interesting direction. With the diffusion model as the most popular generative model, numerous works have attempted two active…

Sound · Computer Science 2023-04-04 Chenshuang Zhang , Chaoning Zhang , Sheng Zheng , Mengchun Zhang , Maryam Qamar , Sung-Ho Bae , In So Kweon

DOLPH: Diffusion Models for Phase Retrieval

Phase retrieval refers to the problem of recovering an image from the magnitudes of its complex-valued linear measurements. Since the problem is ill-posed, the recovery requires prior knowledge on the unknown image. We present DOLPH as a…

Image and Video Processing · Electrical Eng. & Systems 2022-11-03 Shirin Shoushtari , Jiaming Liu , Ulugbek S. Kamilov

Combined Generative and Predictive Modeling for Speech Super-resolution

Speech super-resolution (SR) is the task that restores high-resolution speech from low-resolution input. Existing models employ simulated data and constrained experimental settings, which limit generalization to real-world SR. Predictive…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-26 Heming Wang , Eric W. Healy , DeLiang Wang

A Study on Speech Enhancement Based on Diffusion Probabilistic Model

Diffusion probabilistic models have demonstrated an outstanding capability to model natural images and raw audio waveforms through a paired diffusion and reverse processes. The unique property of the reverse process (namely, eliminating…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-23 Yen-Ju Lu , Yu Tsao , Shinji Watanabe

Improved probabilistic regression using diffusion models

Probabilistic regression models the entire predictive distribution of a response variable, offering richer insights than classical point estimates and directly allowing for uncertainty quantification. While diffusion-based generative models…

Machine Learning · Computer Science 2025-10-07 Carlo Kneissl , Christopher Bülte , Philipp Scholl , Gitta Kutyniok

Conditional Diffusion Probabilistic Model for Speech Enhancement

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-11 Yen-Ju Lu , Zhong-Qiu Wang , Shinji Watanabe , Alexander Richard , Cheng Yu , Yu Tsao

Unsupervised speech enhancement with diffusion-based generative models

Recently, conditional score-based diffusion models have gained significant attention in the field of supervised speech enhancement, yielding state-of-the-art performance. However, these methods may face challenges when generalising to…

Computer Vision and Pattern Recognition · Computer Science 2023-09-20 Berné Nortier , Mostafa Sadeghi , Romain Serizel

Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency

Diffusion models are a class of generative models that have been recently used for speech enhancement with remarkable success but are computationally expensive at inference time. Therefore, these models are impractical for processing…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-15 Bunlong Lay , Rostislav Makarov , Timo Gerkmann

Diffusion Models for Audio Restoration

With the development of audio playback devices and fast data transmission, the demand for high sound quality is rising for both entertainment and communications. In this quest for better sound quality, challenges emerge from distortions and…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-12 Jean-Marie Lemercier , Julius Richter , Simon Welker , Eloi Moliner , Vesa Välimäki , Timo Gerkmann

Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models

Diffusion models have emerged as powerful generative frameworks by progressively adding noise to data through a forward process and then reversing this process to generate realistic samples. While these models have achieved strong…

Machine Learning · Computer Science 2025-03-04 Xingzhuo Guo , Yu Zhang , Baixu Chen , Haoran Xu , Jianmin Wang , Mingsheng Long

DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval

Existing audio-text retrieval (ATR) methods are essentially discriminative models that aim to maximize the conditional likelihood, represented as p(candidates|query). Nevertheless, this methodology fails to consider the intrinsic data…

Sound · Computer Science 2024-10-18 Yifei Xin , Xuxin Cheng , Zhihong Zhu , Xusheng Yang , Yuexian Zou

Diffusion-based Signal Refiner for Speech Enhancement and Separation

Although recent speech processing technologies have achieved significant improvements in objective metrics, there still remains a gap in human perceptual quality. This paper proposes Diffiner, a novel solution that utilizes the powerful…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-11 Masato Hirano , Ryosuke Sawata , Naoki Murata , Shusuke Takahashi , Yuki Mitsufuji

Extract and Diffuse: Latent Integration for Improved Diffusion-based Speech and Vocal Enhancement

Diffusion-based generative models have recently achieved remarkable results in speech and vocal enhancement due to their ability to model complex speech data distributions. While these models generalize well to unseen acoustic environments,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-23 Yudong Yang , Zhan Liu , Wenyi Yu , Guangzhi Sun , Qiuqiang Kong , Chao Zhang

Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain

Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals. In this work, we extend these models to the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-08 Simon Welker , Julius Richter , Timo Gerkmann

Diffusion Buffer for Online Generative Speech Enhancement

Online Speech Enhancement was mainly reserved for predictive models. A key advantage of these models is that for an incoming signal frame from a stream of data, the model is called only once for enhancement. In contrast, generative Speech…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-22 Bunlong Lay , Rostislav Makarov , Simon Welker , Maris Hillemann , Timo Gerkmann