Related papers: DiffWave: A Versatile Diffusion Model for Audio Sy…

DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation

Diffusion models have recently been shown to be relevant for high-quality speech generation. Most work has been focused on generating spectrograms, and as such, they further require a subsequent model to convert the spectrogram to a…

Sound · Computer Science 2024-03-12 Roi Benita , Michael Elad , Joseph Keshet

UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model

This paper introduces UnDiff, a diffusion probabilistic model capable of solving various speech inverse tasks. Being once trained for speech waveform generation in an unconditional manner, it can be adapted to different tasks including…

Sound · Computer Science 2023-10-13 Anastasiia Iashchenko , Pavel Andreev , Ivan Shchekotov , Nicholas Babaev , Dmitry Vetrov

A Study on Speech Enhancement Based on Diffusion Probabilistic Model

Diffusion probabilistic models have demonstrated an outstanding capability to model natural images and raw audio waveforms through a paired diffusion and reverse processes. The unique property of the reverse process (namely, eliminating…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-23 Yen-Ju Lu , Yu Tsao , Shinji Watanabe

Realistic Gramophone Noise Synthesis using a Diffusion Model

This paper introduces a novel data-driven strategy for synthesizing gramophone noise audio textures. A diffusion probabilistic model is applied to generate highly realistic quasiperiodic noises. The proposed model is designed to generate…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-01 Eloi Moliner , Vesa Välimäki

WaveGrad: Estimating Gradients for Waveform Generation

This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density. The model is built on prior work on score matching and diffusion probabilistic models. It starts from a Gaussian…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-12 Nanxin Chen , Yu Zhang , Heiga Zen , Ron J. Weiss , Mohammad Norouzi , William Chan

From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion

Deep generative models can generate high-fidelity audio conditioned on various types of representations (e.g., mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)). Recently, such models have been used to synthesize audio waveforms…

Sound · Computer Science 2023-11-09 Robin San Roman , Yossi Adi , Antoine Deleforge , Romain Serizel , Gabriel Synnaeve , Alexandre Défossez

DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents

Diffusion probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand,…

Machine Learning · Computer Science 2022-11-30 Kushagra Pandey , Avideep Mukherjee , Piyush Rai , Abhishek Kumar

WaveNet: A Generative Model for Raw Audio

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones;…

Sound · Computer Science 2016-09-20 Aaron van den Oord , Sander Dieleman , Heiga Zen , Karen Simonyan , Oriol Vinyals , Alex Graves , Nal Kalchbrenner , Andrew Senior , Koray Kavukcuoglu

Diffusion-based Unsupervised Audio-visual Speech Enhancement

This paper proposes a new unsupervised audio-visual speech enhancement (AVSE) approach that combines a diffusion-based audio-visual speech generative model with a non-negative matrix factorization (NMF) noise model. First, the diffusion…

Sound · Computer Science 2025-01-16 Jean-Eudes Ayilo , Mostafa Sadeghi , Romain Serizel , Xavier Alameda-Pineda

A Mutil-conditional Diffusion Transformer for Versatile Seismic Wave Generation

Seismic wave generation creates labeled waveform datasets for source parameter inversion, subsurface analysis, and, notably, training artificial intelligence seismology models. Traditionally, seismic wave generation has been time-consuming,…

Geophysics · Physics 2025-09-23 Longfei Duan , Zicheng Zhang , Lianqing Zhou , Congying Han , Lei Bai , Tiande Guo , Cuiping Zhao

FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis

Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-22 Rongjie Huang , Max W. Y. Lam , Jun Wang , Dan Su , Dong Yu , Yi Ren , Zhou Zhao

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

Singing voice synthesis (SVS) systems are built to synthesize high-quality and expressive singing voice, in which the acoustic model generates the acoustic features (e.g., mel-spectrogram) given a music score. Previous singing acoustic…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-23 Jinglin Liu , Chengxi Li , Yi Ren , Feiyang Chen , Zhou Zhao

Diffuse or Confuse: A Diffusion Deepfake Speech Dataset

Advancements in artificial intelligence and machine learning have significantly improved synthetic speech generation. This paper explores diffusion models, a novel method for creating realistic synthetic speech. We create a diffusion…

Cryptography and Security · Computer Science 2025-01-15 Anton Firc , Kamil Malinka , Petr Hanáček

Conditional Diffusion Probabilistic Model for Speech Enhancement

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-11 Yen-Ju Lu , Zhong-Qiu Wang , Shinji Watanabe , Alexander Richard , Cheng Yu , Yu Tsao

A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation

Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Gwanghyun Kim , Alonso Martinez , Yu-Chuan Su , Brendan Jou , José Lezama , Agrim Gupta , Lijun Yu , Lu Jiang , Aren Jansen , Jacob Walker , Krishna Somandepalli

SpecDiff-GAN: A Spectrally-Shaped Noise Diffusion GAN for Speech and Music Synthesis

Generative adversarial network (GAN) models can synthesize highquality audio signals while ensuring fast sample generation. However, they are difficult to train and are prone to several issues including mode collapse and divergence. In this…

Sound · Computer Science 2024-02-06 Teysir Baoueb , Haocheng Liu , Mathieu Fontaine , Jonathan Le Roux , Gael Richard

Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion

Denoising Diffusion Probabilistic Models have shown extraordinary ability on various generative tasks. However, their slow inference speed renders them impractical in speech synthesis. This paper proposes a linear diffusion model (LinDiff)…

Sound · Computer Science 2023-06-13 Haogeng Liu , Tao Wang , Jie Cao , Ran He , Jianhua Tao

Restoring degraded speech via a modified diffusion model

There are many deterministic mathematical operations (e.g. compression, clipping, downsampling) that degrade speech quality considerably. In this paper we introduce a neural network architecture, based on a modification of the DiffWave…

Sound · Computer Science 2021-09-03 Jianwei Zhang , Suren Jayasuriya , Visar Berisha

Diffusion models for audio semantic communication

Directly sending audio signals from a transmitter to a receiver across a noisy channel may absorb consistent bandwidth and be prone to errors when trying to recover the transmitted bits. On the contrary, the recent semantic communication…

Sound · Computer Science 2023-09-15 Eleonora Grassucci , Christian Marinoni , Andrea Rodriguez , Danilo Comminiello

Voice Conversion with Denoising Diffusion Probabilistic GAN Models

Voice conversion is a method that allows for the transformation of speaking style while maintaining the integrity of linguistic information. There are many researchers using deep generative models for voice conversion tasks. Generative…

Sound · Computer Science 2023-08-29 Xulong Zhang , Jianzong Wang , Ning Cheng , Jing Xiao