Related papers: Complex-Cycle-Consistent Diffusion Model for Monau…

Self-Supervised Learning based Monaural Speech Enhancement with Complex-Cycle-Consistent

Recently, self-supervised learning (SSL) techniques have been introduced to solve the monaural speech enhancement problem. Due to the lack of using clean phase information, the enhancement performance is limited in most SSL methods.…

Sound · Computer Science 2021-12-22 Yi Li , Yang Sun , Syed Mohsen Naqvi

MDDM: A Multi-view Discriminative Enhanced Diffusion-based Model for Speech Enhancement

With the development of deep learning, speech enhancement has been greatly optimized in terms of speech quality. Previous methods typically focus on the discriminative supervised learning or generative modeling, which tends to introduce…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-31 Nan Xu , Zhaolong Huang , Xiaonan Zhi

Conditional Latent Diffusion-Based Speech Enhancement Via Dual Context Learning

Recently, the application of diffusion probabilistic models has advanced speech enhancement through generative approaches. However, existing diffusion-based methods have focused on the generation process in high-dimensional waveform or…

Sound · Computer Science 2025-01-20 Shengkui Zhao , Zexu Pan , Kun Zhou , Yukun Ma , Chong Zhang , Bin Ma

NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement

The goal of speech enhancement (SE) is to eliminate the background interference from the noisy speech signal. Generative models such as diffusion models (DM) have been applied to the task of SE because of better generalization in unseen…

Sound · Computer Science 2023-09-06 Wen Wang , Dongchao Yang , Qichen Ye , Bowen Cao , Yuexian Zou

Real-time Monaural Speech Enhancement With Short-time Discrete Cosine Transform

Speech enhancement algorithms based on deep learning have been improved in terms of speech intelligibility and perceptual quality greatly. Many methods focus on enhancing the amplitude spectrum while reconstructing speech using the mixture…

Audio and Speech Processing · Electrical Eng. & Systems 2021-02-10 Qinglong Li , Fei Gao , Haixin Guan , Kaichi Ma

A Study on Speech Enhancement Based on Diffusion Probabilistic Model

Diffusion probabilistic models have demonstrated an outstanding capability to model natural images and raw audio waveforms through a paired diffusion and reverse processes. The unique property of the reverse process (namely, eliminating…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-23 Yen-Ju Lu , Yu Tsao , Shinji Watanabe

End-to-End Model for Speech Enhancement by Consistent Spectrogram Masking

Recently, phase processing is attracting increasinginterest in speech enhancement community. Some researchersintegrate phase estimations module into speech enhancementmodels by using complex-valued short-time Fourier transform(STFT)…

Sound · Computer Science 2019-01-03 Xingjian Du , Mengyao Zhu , Xuan Shi , Xinpeng Zhang , Wen Zhang , Jingdong Chen

DisContSE: Single-Step Diffusion Speech Enhancement Based on Joint Discrete and Continuous Embeddings

Diffusion speech enhancement on discrete audio codec features gain immense attention due to their improved speech component reconstruction capability. However, they usually suffer from high inference computational complexity due to multiple…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-30 Yihui Fu , Tim Fingscheidt

Noise-aware Speech Enhancement using Diffusion Probabilistic Model

With recent advances of diffusion model, generative speech enhancement (SE) has attracted a surge of research interest due to its great potential for unseen testing noises. However, existing efforts mainly focus on inherent properties of…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-05 Yuchen Hu , Chen Chen , Ruizhe Li , Qiushi Zhu , Eng Siong Chng

Magnitude-and-phase-aware Speech Enhancement with Parallel Sequence Modeling

In speech enhancement (SE), phase estimation is important for perceptual quality, so many methods take clean speech's complex short-time Fourier transform (STFT) spectrum or the complex ideal ratio mask (cIRM) as the learning target. To…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-12 Yuewei Zhang , Huanbin Zou , Jie Zhu

Cycle Diffusion Model for Counterfactual Image Generation

Deep generative models have demonstrated remarkable success in medical image synthesis. However, ensuring conditioning faithfulness and high-quality synthetic images for direct or counterfactual generation remains a challenge. In this work,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-31 Fangrui Huang , Alan Wang , Binxu Li , Bailey Trang , Ridvan Yesiloglu , Tianyu Hua , Wei Peng , Ehsan Adeli

AMDM-SE: Attention-based Multichannel Diffusion Model for Speech Enhancement

Diffusion models have recently achieved impressive results in reconstructing images from noisy inputs, and similar ideas have been applied to speech enhancement by treating time-frequency representations as images. With the ubiquity of…

Audio and Speech Processing · Electrical Eng. & Systems 2026-01-21 Renana Opochinsky , Sharon Gannot

Cycle-Consistent Speech Enhancement

Feature mapping using deep neural networks is an effective approach for single-channel speech enhancement. Noisy features are transformed to the enhanced ones through a mapping network and the mean square errors between the enhanced and…

Audio and Speech Processing · Electrical Eng. & Systems 2019-05-02 Zhong Meng , Jinyu Li , Yifan Gong , Biing-Hwang , Juang

SEED: Speaker Embedding Enhancement Diffusion Model

A primary challenge when deploying speaker recognition systems in real-world applications is performance degradation caused by environmental mismatch. We propose a diffusion-based method that takes speaker embeddings extracted from a…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-23 KiHyun Nam , Jungwoo Heo , Jee-weon Jung , Gangin Park , Chaeyoung Jung , Ha-Jin Yu , Joon Son Chung

Restoring degraded speech via a modified diffusion model

There are many deterministic mathematical operations (e.g. compression, clipping, downsampling) that degrade speech quality considerably. In this paper we introduce a neural network architecture, based on a modification of the DiffWave…

Sound · Computer Science 2021-09-03 Jianwei Zhang , Suren Jayasuriya , Visar Berisha

Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement

For the lack of adequate paired noisy-clean speech corpus in many real scenarios, non-parallel training is a promising task for DNN-based speech enhancement methods. However, because of the severe mismatch between input and target speeches,…

Sound · Computer Science 2022-02-15 Guochen Yu , Andong Li , Yutian Wang , Yinuo Guo , Hui Wang , Chengshi Zheng

Conditional Diffusion Probabilistic Model for Speech Enhancement

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-11 Yen-Ju Lu , Zhong-Qiu Wang , Shinji Watanabe , Alexander Richard , Cheng Yu , Yu Tsao

Single and Few-step Diffusion for Generative Speech Enhancement

Diffusion models have shown promising results in speech enhancement, using a task-adapted diffusion process for the conditional generation of clean speech given a noisy mixture. However, at test time, the neural network used for score…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-17 Bunlong Lay , Jean-Marie Lemercier , Julius Richter , Timo Gerkmann

A Two-stage Complex Network using Cycle-consistent Generative Adversarial Networks for Speech Enhancement

Cycle-consistent generative adversarial networks (CycleGAN) have shown their promising performance for speech enhancement (SE), while one intractable shortcoming of these CycleGAN-based SE systems is that the noise components propagate…

Sound · Computer Science 2021-09-07 Guochen Yu , Yutian Wang , Hui Wang , Qin Zhang , Chengshi Zheng

Phase-aware Speech Enhancement with Deep Complex U-Net

Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction. This is due to the difficulty of estimating the phase of…

Sound · Computer Science 2019-04-03 Hyeong-Seok Choi , Jang-Hyun Kim , Jaesung Huh , Adrian Kim , Jung-Woo Ha , Kyogu Lee