Related papers: Syntax-Guided Diffusion Language Models with User-…

Diffusion Guided Language Modeling

Current language models demonstrate remarkable proficiency in text generation. However, for many applications it is desirable to control attributes, such as sentiment, or toxicity, of the generated language -- ideally tailored towards each…

Computation and Language · Computer Science 2024-08-09 Justin Lovelace , Varsha Kishore , Yiwei Chen , Kilian Q. Weinberger

GlyphDiffusion: Text Generation as Image Generation

Diffusion models have become a new generative paradigm for text generation. Considering the discrete categorical nature of text, in this paper, we propose GlyphDiffusion, a novel diffusion approach for text generation via text-guided image…

Computation and Language · Computer Science 2023-05-09 Junyi Li , Wayne Xin Zhao , Jian-Yun Nie , Ji-Rong Wen

Audience-Centric Natural Language Generation via Style Infusion

Adopting contextually appropriate, audience-tailored linguistic styles is critical to the success of user-centric language generation systems (e.g., chatbots, computer-aided writing, dialog systems). While existing approaches demonstrate…

Computation and Language · Computer Science 2023-01-26 Samraj Moorjani , Adit Krishnan , Hari Sundaram , Ewa Maslowska , Aravind Sankar

Contextualized Diffusion Models for Text-Guided Image and Video Generation

Conditional diffusion models have exhibited superior performance in high-fidelity text-guided visual generation and editing. Nevertheless, prevailing text-guided visual diffusion models primarily focus on incorporating text-visual…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Ling Yang , Zhilong Zhang , Zhaochen Yu , Jingwei Liu , Minkai Xu , Stefano Ermon , Bin Cui

Self-conditioned Embedding Diffusion for Text Generation

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as…

Computation and Language · Computer Science 2022-11-09 Robin Strudel , Corentin Tallec , Florent Altché , Yilun Du , Yaroslav Ganin , Arthur Mensch , Will Grathwohl , Nikolay Savinov , Sander Dieleman , Laurent Sifre , Rémi Leblond

DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles

Human speech exhibits rich and flexible prosodic variations. To address the one-to-many mapping problem from text to prosody in a reasonable and flexible manner, we propose DiffStyleTTS, a multi-speaker acoustic model based on a conditional…

Sound · Computer Science 2024-12-05 Jiaxuan Liu , Zhaoci Liu , Yajun Hu , Yingying Gao , Shilei Zhang , Zhenhua Ling

DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models

Recently, diffusion models have emerged as a new paradigm for generative models. Despite the success in domains using continuous signals such as vision and audio, adapting diffusion models to natural language is under-explored due to the…

Computation and Language · Computer Science 2023-02-15 Shansan Gong , Mukai Li , Jiangtao Feng , Zhiyong Wu , Lingpeng Kong

DreamWalk: Style Space Exploration using Diffusion Guidance

Text-conditioned diffusion models can generate impressive images, but fall short when it comes to fine-grained control. Unlike direct-editing tools like Photoshop, text conditioned models require the artist to perform "prompt engineering,"…

Computer Vision and Pattern Recognition · Computer Science 2024-04-05 Michelle Shu , Charles Herrmann , Richard Strong Bowen , Forrester Cole , Ramin Zabih

Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis

Conditional generative models typically demand large annotated training sets to achieve high-quality synthesis. As a result, there has been significant interest in designing models that perform plug-and-play generation, i.e., to use a…

Computer Vision and Pattern Recognition · Computer Science 2023-10-03 Nithin Gopalakrishnan Nair , Anoop Cherian , Suhas Lohit , Ye Wang , Toshiaki Koike-Akino , Vishal M. Patel , Tim K. Marks

Instruct-SCTG: Guiding Sequential Controlled Text Generation through Instructions

Instruction-tuned large language models have shown remarkable performance in aligning generated text with user intentions across various tasks. However, maintaining human-like discourse structure in the generated text remains a challenging…

Computation and Language · Computer Science 2023-12-20 Yinhong Liu , Yixuan Su , Ehsan Shareghi , Nigel Collier

Structure and Content-Guided Video Synthesis with Diffusion Models

Text-guided generative diffusion models unlock powerful image creation and editing tools. While these have been extended to video generation, current approaches that edit the content of existing footage while retaining structure require…

Computer Vision and Pattern Recognition · Computer Science 2023-02-07 Patrick Esser , Johnathan Chiu , Parmida Atighehchian , Jonathan Granskog , Anastasis Germanidis

Self-Guided Diffusion Models

Diffusion models have demonstrated remarkable progress in image generation quality, especially when guidance is used to control the generative process. However, guidance requires a large amount of image-annotation pairs for training and is…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Vincent Tao Hu , David W Zhang , Yuki M. Asano , Gertjan J. Burghouts , Cees G. M. Snoek

Language-driven Scene Synthesis using Multi-conditional Diffusion Model

Scene synthesis is a challenging problem with several industrial applications. Recently, substantial efforts have been directed to synthesize the scene using human motions, room layouts, or spatial graphs as the input. However, few studies…

Computer Vision and Pattern Recognition · Computer Science 2023-10-25 An Vuong , Minh Nhat Vu , Toan Tien Nguyen , Baoru Huang , Dzung Nguyen , Thieu Vo , Anh Nguyen

Structural Guidance for Transformer Language Models

Transformer-based language models pre-trained on large amounts of text data have proven remarkably successful in learning generic transferable linguistic representations. Here we study whether structural guidance leads to more human-like…

Computation and Language · Computer Science 2021-08-03 Peng Qian , Tahira Naseem , Roger Levy , Ramón Fernandez Astudillo

Syntax-driven Iterative Expansion Language Models for Controllable Text Generation

The dominant language modeling paradigm handles text as a sequence of discrete tokens. While that approach can capture the latent structure of the text, it is inherently constrained to sequential dynamics for text generation. We propose a…

Computation and Language · Computer Science 2020-11-02 Noe Casas , José A. R. Fonollosa , Marta R. Costa-jussà

More Control for Free! Image Synthesis with Semantic Diffusion Guidance

Controllable image synthesis models allow creation of diverse images based on text instructions or guidance from a reference image. Recently, denoising diffusion probabilistic models have been shown to generate more realistic imagery than…

Computer Vision and Pattern Recognition · Computer Science 2022-12-06 Xihui Liu , Dong Huk Park , Samaneh Azadi , Gong Zhang , Arman Chopikyan , Yuxiao Hu , Humphrey Shi , Anna Rohrbach , Trevor Darrell

SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models

We are witnessing a revolution in conditional image synthesis with the recent success of large scale text-to-image generation methods. This success also opens up new opportunities in controlling the generation and editing process using…

Computer Vision and Pattern Recognition · Computer Science 2024-05-03 Burak Can Biner , Farrin Marouf Sofian , Umur Berkay Karakaş , Duygu Ceylan , Erkut Erdem , Aykut Erdem

DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models

In this paper, we present DesignDiffusion, a simple yet effective framework for the novel task of synthesizing design images from textual descriptions. A primary challenge lies in generating accurate and style-consistent textual and visual…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Zhendong Wang , Jianmin Bao , Shuyang Gu , Dong Chen , Wengang Zhou , Houqiang Li

Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization

Diffusion-based text-to-image personalization have achieved great success in generating subjects specified by users among various contexts. Even though, existing finetuning-based methods still suffer from model overfitting, which greatly…

Computer Vision and Pattern Recognition · Computer Science 2024-01-31 Henglei Lv , Jiayu Xiao , Liang Li , Qingming Huang

Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance

We propose Guided-TTS, a high-quality text-to-speech (TTS) model that does not require any transcript of target speaker using classifier guidance. Guided-TTS combines an unconditional diffusion probabilistic model with a separately trained…

Sound · Computer Science 2022-06-13 Heeseung Kim , Sungwon Kim , Sungroh Yoon