Related papers: Self-conditioned Embedding Diffusion for Text Gene…

TESS: Text-to-Text Self-Conditioned Simplex Diffusion

Diffusion models have emerged as a powerful paradigm for generation, obtaining strong performance in various continuous domains. However, applying continuous diffusion models to natural language remains challenging due to its discrete…

Computation and Language · Computer Science 2024-02-22 Rabeeh Karimi Mahabadi , Hamish Ivison , Jaesung Tae , James Henderson , Iz Beltagy , Matthew E. Peters , Arman Cohan

Latent Diffusion for Language Generation

Diffusion models have achieved great success in modeling continuous data modalities such as images, audio, and video, but have seen limited use in discrete domains such as language. Recent attempts to adapt diffusion to language have…

Computation and Language · Computer Science 2023-11-08 Justin Lovelace , Varsha Kishore , Chao Wan , Eliot Shekhtman , Kilian Q. Weinberger

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

In recent years, image generation has shown a great leap in performance, where diffusion models play a central role. Although generating high-quality images, such models are mainly conditioned on textual descriptions. This begs the…

Sound · Computer Science 2023-05-23 Guy Yariv , Itai Gat , Lior Wolf , Yossi Adi , Idan Schwartz

SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers

Diffusion model, a new generative modelling paradigm, has achieved great success in image, audio, and video generation. However, considering the discrete categorical nature of text, it is not trivial to extend continuous diffusion models to…

Computation and Language · Computer Science 2023-05-23 Hongyi Yuan , Zheng Yuan , Chuanqi Tan , Fei Huang , Songfang Huang

A Reparameterized Discrete Diffusion Model for Text Generation

This work studies discrete diffusion probabilistic models with applications to natural language generation. We derive an alternative yet equivalent formulation of the sampling from discrete diffusion processes and leverage this insight to…

Computation and Language · Computer Science 2024-08-05 Lin Zheng , Jianbo Yuan , Lei Yu , Lingpeng Kong

Text Diffusion with Reinforced Conditioning

Diffusion models have demonstrated exceptional capability in generating high-quality images, videos, and audio. Due to their adaptiveness in iterative refinement, they provide a strong potential for achieving better non-autoregressive…

Computation and Language · Computer Science 2024-02-26 Yuxuan Liu , Tianchi Yang , Shaohan Huang , Zihan Zhang , Haizhen Huang , Furu Wei , Weiwei Deng , Feng Sun , Qi Zhang

Towards Latent Diffusion Suitable For Text

Language diffusion models aim to improve sampling speed and coherence over autoregressive LLMs. We introduce Neural Flow Diffusion Models for language generation, an extension of NFDM that enables the straightforward application of…

Computation and Language · Computer Science 2026-01-26 Nesta Midavaine , Christian A. Naesseth , Grigory Bartosh

Cosmos: Compressed and Smooth Latent Space for Text Diffusion Modeling

Autoregressive language models dominate modern text generation, yet their sequential nature introduces fundamental limitations: decoding is slow, and maintaining global coherence remains challenging. Diffusion models offer a promising…

Computation and Language · Computer Science 2026-01-06 Viacheslav Meshchaninov , Egor Chimbulatov , Alexander Shabalin , Aleksandr Abramov , Dmitry Vetrov

Constrained Discrete Diffusion

Discrete diffusion models are a class of generative models that construct sequences by progressively denoising samples from a categorical noise distribution. Beyond their rapidly growing ability to generate coherent natural language, these…

Computation and Language · Computer Science 2025-12-11 Michael Cardei , Jacob K Christopher , Thomas Hartvigsen , Bhavya Kailkhura , Ferdinando Fioretto

Multi-Concept Customization of Text-to-Image Diffusion

While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Nupur Kumari , Bingliang Zhang , Richard Zhang , Eli Shechtman , Jun-Yan Zhu

DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation

While Diffusion Generative Models have achieved great success on image generation tasks, how to efficiently and effectively incorporate them into speech generation especially translation tasks remains a non-trivial problem. Specifically,…

Computation and Language · Computer Science 2023-10-27 Yongxin Zhu , Zhujin Gao , Xinyuan Zhou , Zhongyi Ye , Linli Xu

Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes

Diffusion models have emerged as a promising approach for text generation, with recent works falling into two main categories: discrete and continuous diffusion models. Discrete diffusion models apply token corruption independently using…

Computation and Language · Computer Science 2025-05-29 Bocheng Li , Zhujin Gao , Linli Xu

TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings

This paper presents the Text Encoding Diffusion Model (TEncDM), a novel approach to diffusion modeling that operates in the space of pre-trained language model encodings. In contrast to traditionally used embeddings, encodings integrate…

Computation and Language · Computer Science 2025-02-25 Alexander Shabalin , Viacheslav Meshchaninov , Egor Chimbulatov , Vladislav Lapikov , Roman Kim , Grigory Bartosh , Dmitry Molchanov , Sergey Markov , Dmitry Vetrov

Empowering Diffusion Models on the Embedding Space for Text Generation

Diffusion models have achieved state-of-the-art synthesis quality on both visual and audio tasks, and recent works further adapt them to textual data by diffusing on the embedding space. In this paper, we conduct systematic studies of the…

Computation and Language · Computer Science 2024-04-23 Zhujin Gao , Junliang Guo , Xu Tan , Yongxin Zhu , Fang Zhang , Jiang Bian , Linli Xu

Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation

Diffusion models have achieved state-of-the-art performance in generating images, audio, and video, but their adaptation to text remains challenging due to its discrete nature. Prior approaches either apply Gaussian diffusion in continuous…

Computation and Language · Computer Science 2026-05-18 Alexander Shabalin , Viacheslav Meshchaninov , Dmitry Vetrov

Diffusion Guided Language Modeling

Current language models demonstrate remarkable proficiency in text generation. However, for many applications it is desirable to control attributes, such as sentiment, or toxicity, of the generated language -- ideally tailored towards each…

Computation and Language · Computer Science 2024-08-09 Justin Lovelace , Varsha Kishore , Yiwei Chen , Kilian Q. Weinberger

DiffCap: Exploring Continuous Diffusion on Image Captioning

Current image captioning works usually focus on generating descriptions in an autoregressive manner. However, there are limited works that focus on generating descriptions non-autoregressively, which brings more decoding diversity. Inspired…

Computer Vision and Pattern Recognition · Computer Science 2023-05-23 Yufeng He , Zefan Cai , Xu Gan , Baobao Chang

Towards Controllable Image Generation through Representation-Conditioned Diffusion Models

Diffusion models have emerged as powerful tools for high-quality image generation and editing, but guiding these models to produce specific outputs remains a challenge. Conventional approaches rely on conditioning mechanisms, such as text…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Nithesh Chandher Karthikeyan , Jonas Unger , Gabriel Eilertsen

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

Text-to-image diffusion models produce impressive results but are frustrating tools for artists who desire fine-grained control. For example, a common use case is to create images of a specific instance in novel contexts, i.e.,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-28 Shengqu Cai , Eric Chan , Yunzhi Zhang , Leonidas Guibas , Jiajun Wu , Gordon Wetzstein

Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation

Text-to-image diffusion models sometimes depict blended concepts in the generated images. One promising use case of this effect would be the nonword-to-image generation task which attempts to generate images intuitively imaginable from a…

Multimedia · Computer Science 2024-11-07 Chihaya Matsuhira , Marc A. Kastner , Takahiro Komamizu , Takatsugu Hirayama , Ichiro Ide