English
Related papers

Related papers: DiffWave: A Versatile Diffusion Model for Audio Sy…

200 papers

Diffusion models have recently been shown to be relevant for high-quality speech generation. Most work has been focused on generating spectrograms, and as such, they further require a subsequent model to convert the spectrogram to a…

Sound · Computer Science 2024-03-12 Roi Benita , Michael Elad , Joseph Keshet

This paper introduces UnDiff, a diffusion probabilistic model capable of solving various speech inverse tasks. Being once trained for speech waveform generation in an unconditional manner, it can be adapted to different tasks including…

Diffusion probabilistic models have demonstrated an outstanding capability to model natural images and raw audio waveforms through a paired diffusion and reverse processes. The unique property of the reverse process (namely, eliminating…

Audio and Speech Processing · Electrical Eng. & Systems 2021-11-23 Yen-Ju Lu , Yu Tsao , Shinji Watanabe

This paper introduces a novel data-driven strategy for synthesizing gramophone noise audio textures. A diffusion probabilistic model is applied to generate highly realistic quasiperiodic noises. The proposed model is designed to generate…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-01 Eloi Moliner , Vesa Välimäki

This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density. The model is built on prior work on score matching and diffusion probabilistic models. It starts from a Gaussian…

Audio and Speech Processing · Electrical Eng. & Systems 2020-10-12 Nanxin Chen , Yu Zhang , Heiga Zen , Ron J. Weiss , Mohammad Norouzi , William Chan

Deep generative models can generate high-fidelity audio conditioned on various types of representations (e.g., mel-spectrograms, Mel-frequency Cepstral Coefficients (MFCC)). Recently, such models have been used to synthesize audio waveforms…

Diffusion probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand,…

Machine Learning · Computer Science 2022-11-30 Kushagra Pandey , Avideep Mukherjee , Piyush Rai , Abhishek Kumar

This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones;…

This paper proposes a new unsupervised audio-visual speech enhancement (AVSE) approach that combines a diffusion-based audio-visual speech generative model with a non-negative matrix factorization (NMF) noise model. First, the diffusion…

Sound · Computer Science 2025-01-16 Jean-Eudes Ayilo , Mostafa Sadeghi , Romain Serizel , Xavier Alameda-Pineda

Seismic wave generation creates labeled waveform datasets for source parameter inversion, subsurface analysis, and, notably, training artificial intelligence seismology models. Traditionally, seismic wave generation has been time-consuming,…

Geophysics · Physics 2025-09-23 Longfei Duan , Zicheng Zhang , Lianqing Zhou , Congying Han , Lei Bai , Tiande Guo , Cuiping Zhao

Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper…

Audio and Speech Processing · Electrical Eng. & Systems 2022-04-22 Rongjie Huang , Max W. Y. Lam , Jun Wang , Dan Su , Dong Yu , Yi Ren , Zhou Zhao

Singing voice synthesis (SVS) systems are built to synthesize high-quality and expressive singing voice, in which the acoustic model generates the acoustic features (e.g., mel-spectrogram) given a music score. Previous singing acoustic…

Audio and Speech Processing · Electrical Eng. & Systems 2022-03-23 Jinglin Liu , Chengxi Li , Yi Ren , Feiyang Chen , Zhou Zhao

Advancements in artificial intelligence and machine learning have significantly improved synthetic speech generation. This paper explores diffusion models, a novel method for creating realistic synthetic speech. We create a diffusion…

Cryptography and Security · Computer Science 2025-01-15 Anton Firc , Kamil Malinka , Petr Hanáček

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-11 Yen-Ju Lu , Zhong-Qiu Wang , Shinji Watanabe , Alexander Richard , Cheng Yu , Yu Tsao

Training diffusion models for audiovisual sequences allows for a range of generation tasks by learning conditional distributions of various input-output combinations of the two modalities. Nevertheless, this strategy often requires training…

Computer Vision and Pattern Recognition · Computer Science 2025-06-10 Gwanghyun Kim , Alonso Martinez , Yu-Chuan Su , Brendan Jou , José Lezama , Agrim Gupta , Lijun Yu , Lu Jiang , Aren Jansen , Jacob Walker , Krishna Somandepalli

Generative adversarial network (GAN) models can synthesize highquality audio signals while ensuring fast sample generation. However, they are difficult to train and are prone to several issues including mode collapse and divergence. In this…

Sound · Computer Science 2024-02-06 Teysir Baoueb , Haocheng Liu , Mathieu Fontaine , Jonathan Le Roux , Gael Richard

Denoising Diffusion Probabilistic Models have shown extraordinary ability on various generative tasks. However, their slow inference speed renders them impractical in speech synthesis. This paper proposes a linear diffusion model (LinDiff)…

Sound · Computer Science 2023-06-13 Haogeng Liu , Tao Wang , Jie Cao , Ran He , Jianhua Tao

There are many deterministic mathematical operations (e.g. compression, clipping, downsampling) that degrade speech quality considerably. In this paper we introduce a neural network architecture, based on a modification of the DiffWave…

Sound · Computer Science 2021-09-03 Jianwei Zhang , Suren Jayasuriya , Visar Berisha

Directly sending audio signals from a transmitter to a receiver across a noisy channel may absorb consistent bandwidth and be prone to errors when trying to recover the transmitted bits. On the contrary, the recent semantic communication…

Sound · Computer Science 2023-09-15 Eleonora Grassucci , Christian Marinoni , Andrea Rodriguez , Danilo Comminiello

Voice conversion is a method that allows for the transformation of speaking style while maintaining the integrity of linguistic information. There are many researchers using deep generative models for voice conversion tasks. Generative…

Sound · Computer Science 2023-08-29 Xulong Zhang , Jianzong Wang , Ning Cheng , Jing Xiao
‹ Prev 1 2 3 10 Next ›