Related papers: Precise Parameter Localization for Textual Generat…

Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models

Recent advancements in diffusion models have notably improved the perceptual quality of generated images in text-to-image synthesis tasks. However, diffusion models often struggle to produce images that accurately reflect the intended…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Yang Zhang , Teoh Tze Tzun , Lim Wei Hern , Tiviatis Sim , Kenji Kawaguchi

Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis

Diffusion-based models have achieved state-of-the-art performance on text-to-image synthesis tasks. However, one critical limitation of these models is the low fidelity of generated images with respect to the text description, such as…

Computer Vision and Pattern Recognition · Computer Science 2023-04-11 Qiucheng Wu , Yujian Liu , Handong Zhao , Trung Bui , Zhe Lin , Yang Zhang , Shiyu Chang

From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models

Diffusion models have revolted the field of text-to-image generation recently. The unique way of fusing text and image information contributes to their remarkable capability of generating highly text-related images. From another…

Computer Vision and Pattern Recognition · Computer Science 2024-10-02 Changming Xiao , Qi Yang , Feng Zhou , Changshui Zhang

Localized Control in Diffusion Models via Latent Vector Prediction

Diffusion models emerged as a leading approach in text-to-image generation, producing high-quality images from textual descriptions. However, attempting to achieve detailed control to get a desired image solely through text remains a…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Pablo Domingo-Gregorio , Javier Ruiz-Hidalgo

Localizing and Editing Knowledge in Text-to-Image Generative Models

Text-to-Image Diffusion Models such as Stable-Diffusion and Imagen have achieved unprecedented quality of photorealism with state-of-the-art FID scores on MS-COCO and other generation benchmarks. Given a caption, image generation requires…

Computer Vision and Pattern Recognition · Computer Science 2023-10-24 Samyadeep Basu , Nanxuan Zhao , Vlad Morariu , Soheil Feizi , Varun Manjunatha

UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models

Text-to-Image (T2I) generation methods based on diffusion model have garnered significant attention in the last few years. Although these image synthesis methods produce visually appealing results, they frequently exhibit spelling errors…

Computer Vision and Pattern Recognition · Computer Science 2023-12-27 Yiming Zhao , Zhouhui Lian

LocInv: Localization-aware Inversion for Text-Guided Image Editing

Large-scale Text-to-Image (T2I) diffusion models demonstrate significant generation capabilities based on textual prompts. Based on the T2I diffusion models, text-guided image editing research aims to empower users to manipulate generated…

Computer Vision and Pattern Recognition · Computer Science 2024-05-03 Chuanming Tang , Kai Wang , Fei Yang , Joost van de Weijer

Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation

Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input…

Computer Vision and Pattern Recognition · Computer Science 2024-02-16 Junjie Shentu , Matthew Watson , Noura Al Moubayed

Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model

Recently, diffusion-based image generation methods are credited for their remarkable text-to-image generation capabilities, while still facing challenges in accurately generating multilingual scene text images. To tackle this problem, we…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Lingjun Zhang , Xinyuan Chen , Yaohui Wang , Yue Lu , Yu Qiao

Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning

Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Fanyue Wei , Wei Zeng , Zhenyang Li , Dawei Yin , Lixin Duan , Wen Li

Structured Pattern Expansion with Diffusion Models

Recent advances in diffusion models have significantly improved the synthesis of materials, textures, and 3D shapes. By conditioning these models via text or images, users can guide the generation, reducing the time required to create…

Computer Vision and Pattern Recognition · Computer Science 2026-01-28 Marzia Riso , Giuseppe Vecchio , Fabio Pellacini

CustomText: Customized Textual Image Generation using Diffusion Models

Textual image generation spans diverse fields like advertising, education, product packaging, social media, information visualization, and branding. Despite recent strides in language-guided image synthesis using diffusion models, current…

Computer Vision and Pattern Recognition · Computer Science 2024-05-22 Shubham Paliwal , Arushi Jain , Monika Sharma , Vikram Jamwal , Lovekesh Vig

Dense Text-to-Image Generation with Attention Modulation

Existing text-to-image diffusion models struggle to synthesize realistic images given dense captions, where each text prompt provides a detailed description for a specific image region. To address this, we propose DenseDiffusion, a…

Computer Vision and Pattern Recognition · Computer Science 2023-08-25 Yunji Kim , Jiyoung Lee , Jin-Hwa Kim , Jung-Woo Ha , Jun-Yan Zhu

Your Diffusion Model is Secretly a Zero-Shot Classifier

The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive…

Machine Learning · Computer Science 2023-09-14 Alexander C. Li , Mihir Prabhudesai , Shivam Duggal , Ellis Brown , Deepak Pathak

LIME: Localized Image Editing via Attention Regularization in Diffusion Models

Diffusion models (DMs) have gained prominence due to their ability to generate high-quality varied images with recent advancements in text-to-image generation. The research focus is now shifting towards the controllability of DMs. A…

Computer Vision and Pattern Recognition · Computer Science 2024-12-06 Enis Simsar , Alessio Tonioni , Yongqin Xian , Thomas Hofmann , Federico Tombari

Local Conditional Controlling for Text-to-Image Diffusion Models

Diffusion models have exhibited impressive prowess in the text-to-image task. Recent methods add image-level structure controls, e.g., edge and depth maps, to manipulate the generation process together with text prompts to obtain desired…

Computer Vision and Pattern Recognition · Computer Science 2024-08-23 Yibo Zhao , Liang Peng , Yang Yang , Zekai Luo , Hengjia Li , Yao Chen , Zheng Yang , Xiaofei He , Wei Zhao , qinglin lu , Boxi Wu , Wei Liu

STAY Diffusion: Styled Layout Diffusion Model for Diverse Layout-to-Image Generation

In layout-to-image (L2I) synthesis, controlled complex scenes are generated from coarse information like bounding boxes. Such a task is exciting to many downstream applications because the input layouts offer strong guidance to the…

Computer Vision and Pattern Recognition · Computer Science 2025-03-18 Ruyu Wang , Xuefeng Hou , Sabrina Schmedding , Marco F. Huber

Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation

Text-to-image (T2I) generative diffusion models have demonstrated outstanding performance in synthesizing diverse, high-quality visuals from text captions. Several layout-to-image models have been developed to control the generation process…

Computer Vision and Pattern Recognition · Computer Science 2025-02-11 Ahmad Süleyman , Göksel Biricik

Grounded Text-to-Image Synthesis with Attention Refocusing

Driven by the scalable diffusion models trained on large-scale datasets, text-to-image synthesis methods have shown compelling results. However, these models still fail to precisely follow the text prompt involving multiple objects,…

Computer Vision and Pattern Recognition · Computer Science 2023-12-06 Quynh Phung , Songwei Ge , Jia-Bin Huang

IMAGE-ALCHEMY: Advancing subject fidelity in personalised text-to-image generation

Recent advances in text-to-image diffusion models, particularly Stable Diffusion, have enabled the generation of highly detailed and semantically rich images. However, personalizing these models to represent novel subjects based on a few…

Computer Vision and Pattern Recognition · Computer Science 2025-05-19 Amritanshu Tiwari , Cherish Puniani , Kaustubh Sharma , Ojasva Nema