Related papers: TransFusion: Transcribing Speech with Multinomial …

Empowering Diffusion Models on the Embedding Space for Text Generation

Diffusion models have achieved state-of-the-art synthesis quality on both visual and audio tasks, and recent works further adapt them to textual data by diffusing on the embedding space. In this paper, we conduct systematic studies of the…

Computation and Language · Computer Science 2024-04-23 Zhujin Gao , Junliang Guo , Xu Tan , Yongxin Zhu , Fang Zhang , Jiang Bian , Linli Xu

On the Application of Diffusion Models for Simultaneous Denoising and Dereverberation

Diffusion models have been shown to achieve natural-sounding enhancement of speech degraded by noise or reverberation. However, their simultaneous denoising and dereverberation capability has so far not been studied much, although this is…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-27 Adrian Meise , Tobias Cord-Landwehr , Reinhold Haeb-Umbach

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data. Transfusion combines the language modeling loss function (next token prediction) with diffusion to train a single transformer over…

Artificial Intelligence · Computer Science 2024-08-21 Chunting Zhou , Lili Yu , Arun Babu , Kushal Tirumala , Michihiro Yasunaga , Leonid Shamis , Jacob Kahn , Xuezhe Ma , Luke Zettlemoyer , Omer Levy

Diffusion Model-Based Image Editing: A Survey

Denoising diffusion models have emerged as a powerful tool for various image generation and editing tasks, facilitating the synthesis of visual content in an unconditional or input-conditional manner. The core idea behind them is learning…

Computer Vision and Pattern Recognition · Computer Science 2025-03-12 Yi Huang , Jiancheng Huang , Yifan Liu , Mingfu Yan , Jiaxi Lv , Jianzhuang Liu , Wei Xiong , He Zhang , Liangliang Cao , Shifeng Chen

Real-World Denoising via Diffusion Model

Real-world image denoising is an extremely important image processing problem, which aims to recover clean images from noisy images captured in natural environments. In recent years, diffusion models have achieved very promising results in…

Computer Vision and Pattern Recognition · Computer Science 2023-05-09 Cheng Yang , Lijing Liang , Zhixun Su

Resfusion: Denoising Diffusion Probabilistic Models for Image Restoration Based on Prior Residual Noise

Recently, research on denoising diffusion models has expanded its application to the field of image restoration. Traditional diffusion-based image restoration methods utilize degraded images as conditional input to effectively guide the…

Computer Vision and Pattern Recognition · Computer Science 2024-10-25 Zhenning Shi , Haoshuai Zheng , Chen Xu , Changsheng Dong , Bin Pan , Xueshuo Xie , Along He , Tao Li , Huazhu Fu

DDRF: Denoising Diffusion Model for Remote Sensing Image Fusion

Denosing diffusion model, as a generative model, has received a lot of attention in the field of image generation recently, thanks to its powerful generation capability. However, diffusion models have not yet received sufficient research in…

Computer Vision and Pattern Recognition · Computer Science 2023-04-12 ZiHan Cao , ShiQi Cao , Xiao Wu , JunMing Hou , Ran Ran , Liang-Jian Deng

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

Diffusion models have recently achieved great success in the synthesis of high-quality images and videos. However, the existing denoising techniques in diffusion models are commonly based on step-by-step noise predictions, which suffers…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Hancheng Ye , Jiakang Yuan , Renqiu Xia , Xiangchao Yan , Tao Chen , Junchi Yan , Botian Shi , Bo Zhang

Posterior Transition Modeling for Unsupervised Diffusion-Based Speech Enhancement

We explore unsupervised speech enhancement using diffusion models as expressive generative priors for clean speech. Existing approaches guide the reverse diffusion process using noisy speech through an approximate, noise-perturbed…

Sound · Computer Science 2025-07-04 Mostafa Sadeghi , Jean-Eudes Ayilo , Romain Serizel , Xavier Alameda-Pineda

Complex Image-Generative Diffusion Transformer for Audio Denoising

The audio denoising technique has captured widespread attention in the deep neural network field. Recently, the audio denoising problem has been converted into an image generation task, and deep learning-based approaches have been applied…

Sound · Computer Science 2024-06-14 Junhui Li , Pu Wang , Jialu Li , Youshan Zhang

Investigating the Design Space of Diffusion Models for Speech Enhancement

Diffusion models are a new class of generative models that have shown outstanding performance in image generation literature. As a consequence, studies have attempted to apply diffusion models to other tasks, such as speech enhancement. A…

Audio and Speech Processing · Electrical Eng. & Systems 2024-10-10 Philippe Gonzalez , Zheng-Hua Tan , Jan Østergaard , Jesper Jensen , Tommy Sonne Alstrøm , Tobias May

Diffusion Models for Constrained Domains

Denoising diffusion models are a novel class of generative algorithms that achieve state-of-the-art performance across a range of domains, including image generation and text-to-image tasks. Building on this success, diffusion models have…

Machine Learning · Computer Science 2024-03-08 Nic Fishman , Leo Klarner , Valentin De Bortoli , Emile Mathieu , Michael Hutchinson

Cold Diffusion for Speech Enhancement

Diffusion models have recently shown promising results for difficult enhancement tasks such as the conditional and unconditional restoration of natural images and audio signals. In this work, we explore the possibility of leveraging a…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-24 Hao Yen , François G. Germain , Gordon Wichern , Jonathan Le Roux

Neural Network Diffusion

Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also \textit{generate high-performing neural network parameters}. Our approach is simple, utilizing an…

Machine Learning · Computer Science 2025-01-03 Kai Wang , Dongwen Tang , Boya Zeng , Yida Yin , Zhaopan Xu , Yukun Zhou , Zelin Zang , Trevor Darrell , Zhuang Liu , Yang You

DiffusionInst: Diffusion Model for Instance Segmentation

Diffusion frameworks have achieved comparable performance with previous state-of-the-art image generation models. Researchers are curious about its variants in discriminative tasks because of its powerful noise-to-image denoising pipeline.…

Computer Vision and Pattern Recognition · Computer Science 2022-12-29 Zhangxuan Gu , Haoxing Chen , Zhuoer Xu , Jun Lan , Changhua Meng , Weiqiang Wang

A Comprehensive Survey on Diffusion Models and Their Applications

Diffusion Models are probabilistic models that create realistic samples by simulating the diffusion process, gradually adding and removing noise from data. These models have gained popularity in domains such as image processing, speech…

Computer Vision and Pattern Recognition · Computer Science 2024-08-21 Md Manjurul Ahsan , Shivakumar Raman , Yingtao Liu , Zahed Siddique

Diffusion Buffer: Online Diffusion-based Speech Enhancement with Sub-Second Latency

Diffusion models are a class of generative models that have been recently used for speech enhancement with remarkable success but are computationally expensive at inference time. Therefore, these models are impractical for processing…

Audio and Speech Processing · Electrical Eng. & Systems 2025-09-15 Bunlong Lay , Rostislav Makarov , Timo Gerkmann

Diffusion Models in Vision: A Survey

Denoising diffusion models represent a recent emerging topic in computer vision, demonstrating remarkable results in the area of generative modeling. A diffusion model is a deep generative model that is based on two stages, a forward…

Computer Vision and Pattern Recognition · Computer Science 2025-01-17 Florinel-Alin Croitoru , Vlad Hondru , Radu Tudor Ionescu , Mubarak Shah

DiffVoice: Text-to-Speech with Latent Diffusion

In this work, we present DiffVoice, a novel text-to-speech model based on latent diffusion. We propose to first encode speech signals into a phoneme-rate latent representation with a variational autoencoder enhanced by adversarial training,…

Audio and Speech Processing · Electrical Eng. & Systems 2023-04-25 Zhijun Liu , Yiwei Guo , Kai Yu

Self-diffusion for Solving Inverse Problems

We propose self-diffusion, a novel framework for solving inverse problems without relying on pretrained generative models. Traditional diffusion-based approaches require training a model on a clean dataset to learn to reverse the forward…

Machine Learning · Computer Science 2025-12-09 Guanxiong Luo , Shoujin Huang , Yanlong Yang