English
Related papers

Related papers: Diffusion-based Unsupervised Audio-visual Speech E…

200 papers

This paper addresses unsupervised diffusion-based single-channel speech enhancement (SE). Prior work in this direction combines a score-based diffusion model trained on clean speech with a Gaussian noise model whose covariance is structured…

Sound · Computer Science 2026-05-26 Jean-Eudes Ayilo , Mostafa Sadeghi , Romain Serizel , Xavier Alameda-Pineda

Diffusion probabilistic models have demonstrated an outstanding capability to model natural images and raw audio waveforms through a paired diffusion and reverse processes. The unique property of the reverse process (namely, eliminating…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-23 Yen-Ju Lu , Yu Tsao , Shinji Watanabe

Recently, conditional score-based diffusion models have gained significant attention in the field of supervised speech enhancement, yielding state-of-the-art performance. However, these methods may face challenges when generalising to…

Computer Vision and Pattern Recognition · Computer Science 2023-09-20 Berné Nortier , Mostafa Sadeghi , Romain Serizel

Diffusion-based generative models have recently gained attention in speech enhancement (SE), providing an alternative to conventional supervised methods. These models transform clean speech training samples into Gaussian noise centered at…

Computer Vision and Pattern Recognition · Computer Science 2023-09-20 Jean-Eudes Ayilo , Mostafa Sadeghi , Romain Serizel

This work proposes an efficient method to enhance the quality of corrupted speech signals by leveraging both acoustic and visual cues. While existing diffusion-based approaches have demonstrated remarkable quality, their applicability is…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-14 Chaeyoung Jung , Suyeon Lee , Ji-Hoon Kim , Joon Son Chung

Speech enhancement systems are typically trained using pairs of clean and noisy speech. In audio-visual speech enhancement (AVSE), there is not as much ground-truth clean data available; most audio-visual datasets are collected in…

Audio and Speech Processing · Electrical Eng. & Systems 2024-11-05 Ju-Chieh Chou , Chung-Ming Chien , Karen Livescu

This work builds on a previous work on unsupervised speech enhancement using a dynamical variational autoencoder (DVAE) as the clean speech model and non-negative matrix factorization (NMF) as the noise model. We propose to replace the NMF…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-14 Xiaoyu Lin , Simon Leglaive , Laurent Girin , Xavier Alameda-Pineda

This paper introduces an audio-visual speech enhancement system that leverages score-based generative models, also known as diffusion models, conditioned on visual information. In particular, we exploit audio-visual embeddings obtained from…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-05 Julius Richter , Simone Frintrop , Timo Gerkmann

With recent advances of diffusion model, generative speech enhancement (SE) has attracted a surge of research interest due to its great potential for unseen testing noises. However, existing efforts mainly focus on inherent properties of…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-05 Yuchen Hu , Chen Chen , Ruizhe Li , Qiushi Zhu , Eng Siong Chng

In this paper, we address the problem of single-microphone speech separation in the presence of ambient noise. We propose a generative unsupervised technique that directly models both clean speech and structured noise components, training…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-19 Yochai Yemini , Rami Ben-Ari , Sharon Gannot , Ethan Fetaya

This paper introduces a novel speech enhancement (SE) approach based on a denoising diffusion probabilistic model (DDPM), termed Guided diffusion for speech enhancement (GDiffuSE). In contrast to conventional methods that directly map noisy…

Sound · Computer Science 2026-03-03 Efrayim Yanir , David Burshtein , Sharon Gannot

The goal of speech enhancement (SE) is to eliminate the background interference from the noisy speech signal. Generative models such as diffusion models (DM) have been applied to the task of SE because of better generalization in unseen…

Sound · Computer Science 2023-09-06 Wen Wang , Dongchao Yang , Qichen Ye , Bowen Cao , Yuexian Zou

We explore unsupervised speech enhancement using diffusion models as expressive generative priors for clean speech. Existing approaches guide the reverse diffusion process using noisy speech through an approximate, noise-perturbed…

Sound · Computer Science 2025-07-04 Mostafa Sadeghi , Jean-Eudes Ayilo , Romain Serizel , Xavier Alameda-Pineda

In this paper, we are interested in audio-visual speech separation given a single-channel audio recording as well as visual information (lips movements) associated with each speaker. We propose an unsupervised technique based on…

Audio and Speech Processing · Electrical Eng. & Systems 2021-09-01 Viet-Nhat Nguyen , Mostafa Sadeghi , Elisa Ricci , Xavier Alameda-Pineda

In this work, we build upon our previous publication and use diffusion-based generative models for speech enhancement. We present a detailed overview of the diffusion process that is based on a stochastic differential equation and delve…

Audio and Speech Processing · Electrical Eng. & Systems 2025-10-14 Julius Richter , Simon Welker , Jean-Marie Lemercier , Bunlong Lay , Timo Gerkmann

This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech. A standard approach to speech enhancement is to train a deep neural network…

We enhance the vanilla adversarial training method for unsupervised Automatic Speech Recognition (ASR) by a diffusion-GAN. Our model (1) injects instance noises of various intensities to the generator's output and unlabeled reference text…

Computation and Language · Computer Science 2023-03-27 Xianchao Wu

Recently, an audio-visual speech generative model based on variational autoencoder (VAE) has been proposed, which is combined with a nonnegative matrix factorization (NMF) model for noise variance to perform unsupervised speech enhancement.…

Audio and Speech Processing · Electrical Eng. & Systems 2019-11-12 Mostafa Sadeghi , Xavier Alameda-Pineda

Speech enhancement (SE) is the foundational task of enhancing the clarity and quality of speech in the presence of non-stationary additive noise. While deterministic deep learning models have been commonly employed for SE, recent research…

Audio and Speech Processing · Electrical Eng. & Systems 2025-03-11 Sonal Kumar , Sreyan Ghosh , Utkarsh Tyagi , Anton Jeran Ratnarajah , Chandra Kiran Reddy Evuru , Ramani Duraiswami , Dinesh Manocha

Unsupervised Anomalous Sound Detection (ASD) aims to design a generalizable method that can be used to detect anomalies when only normal sounds are given. In this paper, Anomalous Sound Detection based on Diffusion Models (ASD-Diffusion) is…

Sound · Computer Science 2024-09-25 Fengrun Zhang , Xiang Xie , Kai Guo
‹ Prev 1 2 3 10 Next ›