Related papers: Diffusion Adaptive Text Embedding for Text-to-Imag…

Self-conditioned Embedding Diffusion for Text Generation

Can continuous diffusion models bring the same performance breakthrough on natural language they did for image generation? To circumvent the discrete nature of text data, we can simply project tokens in a continuous space of embeddings, as…

Computation and Language · Computer Science 2022-11-09 Robin Strudel , Corentin Tallec , Florent Altché , Yilun Du , Yaroslav Ganin , Arthur Mensch , Will Grathwohl , Nikolay Savinov , Sander Dieleman , Laurent Sifre , Rémi Leblond

Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis

Diffusion models (DMs) have recently gained attention with state-of-the-art performance in text-to-image synthesis. Abiding by the tradition in deep learning, DMs are trained and evaluated on the images with fixed sizes. However, users are…

Computer Vision and Pattern Recognition · Computer Science 2023-10-27 Zhiyu Jin , Xuli Shen , Bin Li , Xiangyang Xue

Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization

Text-to-image diffusion models have emerged as powerful tools for high-quality image generation and editing. Many existing approaches rely on text prompts as editing guidance. However, these methods are constrained by the need for manual…

Computer Vision and Pattern Recognition · Computer Science 2025-05-21 Yuanyuan Chang , Yinghua Yao , Tao Qin , Mengmeng Wang , Ivor Tsang , Guang Dai

Dynamic Prompt Optimizing for Text-to-Image Generation

Text-to-image generative models, specifically those based on diffusion models like Imagen and Stable Diffusion, have made substantial advancements. Recently, there has been a surge of interest in the delicate refinement of text prompts.…

Computer Vision and Pattern Recognition · Computer Science 2024-04-08 Wenyi Mo , Tianyu Zhang , Yalong Bai , Bing Su , Ji-Rong Wen , Qing Yang

Improving Diffusion Models for Scene Text Editing with Dual Encoders

Scene text editing is a challenging task that involves modifying or inserting specified texts in an image while maintaining its natural and realistic appearance. Most previous approaches to this task rely on style-transfer models that crop…

Computer Vision and Pattern Recognition · Computer Science 2023-04-13 Jiabao Ji , Guanhua Zhang , Zhaowen Wang , Bairu Hou , Zhifei Zhang , Brian Price , Shiyu Chang

Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis

Diffusion-based models have achieved state-of-the-art performance on text-to-image synthesis tasks. However, one critical limitation of these models is the low fidelity of generated images with respect to the text description, such as…

Computer Vision and Pattern Recognition · Computer Science 2023-04-11 Qiucheng Wu , Yujian Liu , Handong Zhao , Trung Bui , Zhe Lin , Yang Zhang , Shiyu Chang

Fine Tuning Text-to-Image Diffusion Models for Correcting Anomalous Images

Since the advent of GANs and VAEs, image generation models have continuously evolved, opening up various real-world applications with the introduction of Stable Diffusion and DALL-E models. These text-to-image models can generate…

Computer Vision and Pattern Recognition · Computer Science 2024-09-25 Hyunwoo Yoo

Back to the Source: Diffusion-Driven Test-Time Adaptation

Test-time adaptation harnesses test inputs to improve the accuracy of a model trained on source data when tested on shifted target data. Existing methods update the source model by (re-)training on each target domain. While effective,…

Machine Learning · Computer Science 2023-06-22 Jin Gao , Jialing Zhang , Xihui Liu , Trevor Darrell , Evan Shelhamer , Dequan Wang

Diffusion-TTA: Test-time Adaptation of Discriminative Models via Generative Feedback

The advancements in generative modeling, particularly the advent of diffusion models, have sparked a fundamental question: how can these models be effectively used for discriminative tasks? In this work, we find that generative models can…

Computer Vision and Pattern Recognition · Computer Science 2023-12-01 Mihir Prabhudesai , Tsung-Wei Ke , Alexander C. Li , Deepak Pathak , Katerina Fragkiadaki

DiffUTE: Universal Text Editing Diffusion Model

Diffusion model based language-guided image editing has achieved great success recently. However, existing state-of-the-art diffusion models struggle with rendering correct text and text style during generation. To tackle this problem, we…

Computer Vision and Pattern Recognition · Computer Science 2023-10-19 Haoxing Chen , Zhuoer Xu , Zhangxuan Gu , Jun Lan , Xing Zheng , Yaohui Li , Changhua Meng , Huijia Zhu , Weiqiang Wang

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models

Recently large-scale language-image models (e.g., text-guided diffusion models) have considerably improved the image generation capabilities to generate photorealistic images in various domains. Based on this success, current image editing…

Computer Vision and Pattern Recognition · Computer Science 2023-05-09 Wenkai Dong , Song Xue , Xiaoyue Duan , Shumin Han

Design Booster: A Text-Guided Diffusion Model for Image Translation with Spatial Layout Preservation

Diffusion models are able to generate photorealistic images in arbitrary scenes. However, when applying diffusion models to image translation, there exists a trade-off between maintaining spatial structure and high-quality content. Besides,…

Computer Vision and Pattern Recognition · Computer Science 2023-02-07 Shiqi Sun , Shancheng Fang , Qian He , Wei Liu

Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Fanyue Wei , Wei Zeng , Zhenyang Li , Dawei Yin , Lixin Duan , Wen Li

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a…

Computer Vision and Pattern Recognition · Computer Science 2023-03-16 Nataniel Ruiz , Yuanzhen Li , Varun Jampani , Yael Pritch , Michael Rubinstein , Kfir Aberman

Editing Implicit Assumptions in Text-to-Image Diffusion Models

Text-to-image diffusion models often make implicit assumptions about the world when generating images. While some assumptions are useful (e.g., the sky is blue), they can also be outdated, incorrect, or reflective of social biases present…

Computer Vision and Pattern Recognition · Computer Science 2023-08-28 Hadas Orgad , Bahjat Kawar , Yonatan Belinkov

Zero-Shot Personalization of Objects via Textual Inversion

Recent advances in text-to-image diffusion models have substantially improved the quality of image customization, enabling the synthesis of highly realistic images. Despite this progress, achieving fast and efficient personalization remains…

Computer Vision and Pattern Recognition · Computer Science 2026-03-25 Aniket Roy , Maitreya Suin , Rama Chellappa

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion…

Computer Vision and Pattern Recognition · Computer Science 2023-03-15 Yogesh Balaji , Seungjun Nah , Xun Huang , Arash Vahdat , Jiaming Song , Qinsheng Zhang , Karsten Kreis , Miika Aittala , Timo Aila , Samuli Laine , Bryan Catanzaro , Tero Karras , Ming-Yu Liu

Multi-Concept Customization of Text-to-Image Diffusion

While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Nupur Kumari , Bingliang Zhang , Richard Zhang , Eli Shechtman , Jun-Yan Zhu

Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners

Diffusion models, such as Stable Diffusion, have shown incredible performance on text-to-image generation. Since text-to-image generation often requires models to generate visual concepts with fine-grained details and attributes specified…

Computer Vision and Pattern Recognition · Computer Science 2024-04-26 Xuehai He , Weixi Feng , Tsu-Jui Fu , Varun Jampani , Arjun Akula , Pradyumna Narayana , Sugato Basu , William Yang Wang , Xin Eric Wang

ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation

Large-scale text-to-image models have demonstrated amazing ability to synthesize diverse and high-fidelity images. However, these models are often violated by several limitations. Firstly, they require the user to provide precise and…

Computer Vision and Pattern Recognition · Computer Science 2023-05-09 Yupei Lin , Sen Zhang , Xiaojun Yang , Xiao Wang , Yukai Shi