Related papers: TextCraftor: Your Text Encoder Can be Image Qualit…

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

Text-to-Image (T2I) generation methods based on diffusion model have garnered significant attention in the last few years. Although these image synthesis methods produce visually appealing results, they frequently exhibit spelling errors…

Computer Vision and Pattern Recognition · Computer Science 2023-12-27 Yiming Zhao , Zhouhui Lian

Enhancing Diffusion Models with Text-Encoder Reinforcement Learning

Text-to-image diffusion models are typically trained to optimize the log-likelihood objective, which presents challenges in meeting specific requirements for downstream tasks, such as image aesthetics and image-text alignment. Recent…

Computer Vision and Pattern Recognition · Computer Science 2024-07-18 Chaofeng Chen , Annan Wang , Haoning Wu , Liang Liao , Wenxiu Sun , Qiong Yan , Weisi Lin

Improving Diffusion Models for Scene Text Editing with Dual Encoders

Scene text editing is a challenging task that involves modifying or inserting specified texts in an image while maintaining its natural and realistic appearance. Most previous approaches to this task rely on style-transfer models that crop…

Computer Vision and Pattern Recognition · Computer Science 2023-04-13 Jiabao Ji , Guanhua Zhang , Zhaowen Wang , Bairu Hou , Zhifei Zhang , Brian Price , Shiyu Chang

CustomText: Customized Textual Image Generation using Diffusion Models

Textual image generation spans diverse fields like advertising, education, product packaging, social media, information visualization, and branding. Despite recent strides in language-guided image synthesis using diffusion models, current…

Computer Vision and Pattern Recognition · Computer Science 2024-05-22 Shubham Paliwal , Arushi Jain , Monika Sharma , Vikram Jamwal , Lovekesh Vig

Recolour What Matters: Region-Aware Colour Editing via Token-Level Diffusion

Colour is one of the most perceptually salient yet least controllable attributes in image generation. Although recent diffusion models can modify object colours from user instructions, their results often deviate from the intended hue,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-20 Yuqi Yang , Dongliang Chang , Yijia Ling , Ruoyi Du , Zhanyu Ma

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge,…

Computer Vision and Pattern Recognition · Computer Science 2023-02-17 Omer Bar-Tal , Lior Yariv , Yaron Lipman , Tali Dekel

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions. However, despite their success, text descriptions often…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Shihao Zhao , Dongdong Chen , Yen-Chun Chen , Jianmin Bao , Shaozhe Hao , Lu Yuan , Kwan-Yee K. Wong

EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model

We propose EditCrafter, a high-resolution image editing method that operates without tuning, leveraging pretrained text-to-image (T2I) diffusion models to process images at resolutions significantly exceeding those used during training.…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Kunho Kim , Sumin Seo , Yongjun Cho , Hyungjin Chung

A Method for Training-free Person Image Picture Generation

The current state-of-the-art Diffusion model has demonstrated excellent results in generating images. However, the images are monotonous and are mostly the result of the distribution of images of people in the training set, making it…

Computer Vision and Pattern Recognition · Computer Science 2023-05-18 Tianyu Chen

TokenFlow: Consistent Diffusion Features for Consistent Video Editing

The generative AI revolution has recently expanded to videos. Nevertheless, current state-of-the-art video models are still lagging behind image models in terms of visual quality and user control over the generated content. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2023-11-21 Michal Geyer , Omer Bar-Tal , Shai Bagon , Tali Dekel

Discriminative Class Tokens for Text-to-Image Diffusion Models

Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. While impressive, the images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 Idan Schwartz , Vésteinn Snæbjarnarson , Hila Chefer , Ryan Cotterell , Serge Belongie , Lior Wolf , Sagie Benaim

Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models

Generative models have been widely studied in computer vision. Recently, diffusion models have drawn substantial attention due to the high quality of their generated images. A key desired property of image generative models is the ability…

Computer Vision and Pattern Recognition · Computer Science 2022-12-20 Qiucheng Wu , Yujian Liu , Handong Zhao , Ajinkya Kale , Trung Bui , Tong Yu , Zhe Lin , Yang Zhang , Shiyu Chang

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

Text-to-video generation aims to produce a video based on a given prompt. Recently, several commercial video models have been able to generate plausible videos with minimal noise, excellent details, and high aesthetic scores. However, these…

Computer Vision and Pattern Recognition · Computer Science 2024-01-18 Haoxin Chen , Yong Zhang , Xiaodong Cun , Menghan Xia , Xintao Wang , Chao Weng , Ying Shan

ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors

Recently, the multimedia community has witnessed the rise of diffusion models trained on large-scale multi-modal data for visual content creation, particularly in the field of text-to-image generation. In this paper, we propose a new task…

Computer Vision and Pattern Recognition · Computer Science 2023-11-10 Jingwen Chen , Yingwei Pan , Ting Yao , Tao Mei

TextBoost: Boosting Text Encoder for Personalized Text-to-Image Generation

In this paper, we introduce TextBoost, an efficient one-shot personalization approach for text-to-image diffusion models. Traditional personalization methods typically involve fine-tuning extensive portions of the model, leading to…

Computer Vision and Pattern Recognition · Computer Science 2026-05-20 NaHyeon Park , Kunhee Kim , Hyunjung Shim

Multi-Concept Customization of Text-to-Image Diffusion

While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Nupur Kumari , Bingliang Zhang , Richard Zhang , Eli Shechtman , Jun-Yan Zhu

Scaling Down Text Encoders of Text-to-Image Diffusion Models

Text encoders in diffusion models have rapidly evolved, transitioning from CLIP to T5-XXL. Although this evolution has significantly enhanced the models' ability to understand complex prompts and generate text, it also leads to a…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Lifu Wang , Daqing Liu , Xinchen Liu , Xiaodong He

Your Diffusion Model is Secretly a Zero-Shot Classifier

The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive…

Machine Learning · Computer Science 2023-09-14 Alexander C. Li , Mihir Prabhudesai , Shivam Duggal , Ellis Brown , Deepak Pathak

Trade-offs in Fine-tuned Diffusion Models Between Accuracy and Interpretability

Recent advancements in diffusion models have significantly impacted the trajectory of generative machine learning research, with many adopting the strategy of fine-tuning pre-trained models using domain-specific text-to-image datasets.…

Computer Vision and Pattern Recognition · Computer Science 2023-12-21 Mischa Dombrowski , Hadrien Reynaud , Johanna P. Müller , Matthew Baugh , Bernhard Kainz

Are Diffusion Models Vision-And-Language Reasoners?

Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these…

Computer Vision and Pattern Recognition · Computer Science 2023-11-06 Benno Krojer , Elinor Poole-Dayan , Vikram Voleti , Christopher Pal , Siva Reddy