English
Related papers

Related papers: DiffPhase: Generative Diffusion-based STFT Phase R…

200 papers

Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive…

Audio and Speech Processing · Electrical Eng. & Systems 2024-03-13 Jean-Marie Lemercier , Julius Richter , Simon Welker , Timo Gerkmann

Diffusion models have shown promising results in speech enhancement, using a task-adapted diffusion process for the conditional generation of clean speech given a noisy mixture. However, at test time, the neural network used for score…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-17 Bunlong Lay , Jean-Marie Lemercier , Julius Richter , Timo Gerkmann

Diffusion-based generative models have had a high impact on the computer vision and speech processing communities these past years. Besides data generation tasks, they have also been employed for data restoration tasks like speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-17 Jean-Marie Lemercier , Julius Richter , Simon Welker , Timo Gerkmann

In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-14 Julius Richter , Simon Welker , Jean-Marie Lemercier , Bunlong Lay , Timo Gerkmann

Diffusion model, as a new generative model which is very popular in image generation and audio synthesis, is rarely used in speech enhancement. In this paper, we use the diffusion model as a module for stochastic refinement. We propose…

Sound · Computer Science 2022-11-01 Zhibin Qiu , Mengfan Fu , Yinfeng Yu , LiLi Yin , Fuchun Sun , Hao Huang

Generative AI has demonstrated impressive performance in various fields, among which speech synthesis is an interesting direction. With the diffusion model as the most popular generative model, numerous works have attempted two active…

Phase retrieval refers to the problem of recovering an image from the magnitudes of its complex-valued linear measurements. Since the problem is ill-posed, the recovery requires prior knowledge on the unknown image. We present DOLPH as a…

Image and Video Processing · Electrical Eng. & Systems 2022-11-03 Shirin Shoushtari , Jiaming Liu , Ulugbek S. Kamilov

Speech super-resolution (SR) is the task that restores high-resolution speech from low-resolution input. Existing models employ simulated data and constrained experimental settings, which limit generalization to real-world SR. Predictive…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-26 Heming Wang , Eric W. Healy , DeLiang Wang

Diffusion probabilistic models have demonstrated an outstanding capability to model natural images and raw audio waveforms through a paired diffusion and reverse processes. The unique property of the reverse process (namely, eliminating…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-23 Yen-Ju Lu , Yu Tsao , Shinji Watanabe

Probabilistic regression models the entire predictive distribution of a response variable, offering richer insights than classical point estimates and directly allowing for uncertainty quantification. While diffusion-based generative models…

Machine Learning · Computer Science 2025-10-07 Carlo Kneissl , Christopher Bülte , Philipp Scholl , Gitta Kutyniok

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-11 Yen-Ju Lu , Zhong-Qiu Wang , Shinji Watanabe , Alexander Richard , Cheng Yu , Yu Tsao

Recently, conditional score-based diffusion models have gained significant attention in the field of supervised speech enhancement, yielding state-of-the-art performance. However, these methods may face challenges when generalising to…

Computer Vision and Pattern Recognition · Computer Science 2023-09-20 Berné Nortier , Mostafa Sadeghi , Romain Serizel

Diffusion models are a class of generative models that have been recently used for speech enhancement with remarkable success but are computationally expensive at inference time. Therefore, these models are impractical for processing…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-15 Bunlong Lay , Rostislav Makarov , Timo Gerkmann

With the development of audio playback devices and fast data transmission, the demand for high sound quality is rising for both entertainment and communications. In this quest for better sound quality, challenges emerge from distortions and…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-12 Jean-Marie Lemercier , Julius Richter , Simon Welker , Eloi Moliner , Vesa Välimäki , Timo Gerkmann

Diffusion models have emerged as powerful generative frameworks by progressively adding noise to data through a forward process and then reversing this process to generate realistic samples. While these models have achieved strong…

Machine Learning · Computer Science 2025-03-04 Xingzhuo Guo , Yu Zhang , Baixu Chen , Haoran Xu , Jianmin Wang , Mingsheng Long

Existing audio-text retrieval (ATR) methods are essentially discriminative models that aim to maximize the conditional likelihood, represented as p(candidates|query). Nevertheless, this methodology fails to consider the intrinsic data…

Sound · Computer Science 2024-10-18 Yifei Xin , Xuxin Cheng , Zhihong Zhu , Xusheng Yang , Yuexian Zou

Although recent speech processing technologies have achieved significant improvements in objective metrics, there still remains a gap in human perceptual quality. This paper proposes Diffiner, a novel solution that utilizes the powerful…

Audio and Speech Processing · Electrical Eng. & Systems 2026-02-11 Masato Hirano , Ryosuke Sawata , Naoki Murata , Shusuke Takahashi , Yuki Mitsufuji

Diffusion-based generative models have recently achieved remarkable results in speech and vocal enhancement due to their ability to model complex speech data distributions. While these models generalize well to unseen acoustic environments,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-23 Yudong Yang , Zhan Liu , Wenyi Yu , Guangzhi Sun , Qiuqiang Kong , Chao Zhang

Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals. In this work, we extend these models to the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-08 Simon Welker , Julius Richter , Timo Gerkmann

Online Speech Enhancement was mainly reserved for predictive models. A key advantage of these models is that for an incoming signal frame from a stream of data, the model is called only once for enhancement. In contrast, generative Speech…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-22 Bunlong Lay , Rostislav Makarov , Simon Welker , Maris Hillemann , Timo Gerkmann
‹ Prev 1 2 3 10 Next ›