Related papers: InstanceDiffusion: Instance-level Control for Imag…

LocRef-Diffusion:Tuning-Free Layout and Appearance-Guided Generation

Recently, text-to-image models based on diffusion have achieved remarkable success in generating high-quality images. However, the challenge of personalized, controllable generation of instances within these images remains an area in need…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Fan Deng , Yaguang Wu , Xinyang Yu , Xiangjun Huang , Jian Yang , Guangyu Yan , Qiang Xu

LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps

The advancement of text-to-image synthesis has introduced powerful generative models capable of creating realistic images from textual prompts. However, precise control over image attributes remains challenging, especially at the instance…

Computer Vision and Pattern Recognition · Computer Science 2025-01-27 Andrey Palaev , Adil Khan , Syed M. Ahsan Kazmi

Local Conditional Controlling for Text-to-Image Diffusion Models

Diffusion models have exhibited impressive prowess in the text-to-image task. Recent methods add image-level structure controls, e.g., edge and depth maps, to manipulate the generation process together with text prompts to obtain desired…

Computer Vision and Pattern Recognition · Computer Science 2024-08-23 Yibo Zhao , Liang Peng , Yang Yang , Zekai Luo , Hengjia Li , Yao Chen , Zheng Yang , Xiaofei He , Wei Zhao , qinglin lu , Boxi Wu , Wei Liu

Generating Compositional Scenes via Text-to-image RGBA Instance Generation

Text-to-image diffusion generative models can generate high quality images at the cost of tedious prompt engineering. Controllability can be improved by introducing layout conditioning, however existing methods lack layout editing ability…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Alessandro Fontanella , Petru-Daniel Tudosiu , Yongxin Yang , Shifeng Zhang , Sarah Parisot

TokenCompose: Text-to-Image Diffusion with Token-level Supervision

We present TokenCompose, a Latent Diffusion Model for text-to-image generation that achieves enhanced consistency between user-specified text prompts and model-generated images. Despite its tremendous success, the standard denoising process…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Zirui Wang , Zhizhou Sha , Zheng Ding , Yilin Wang , Zhuowen Tu

Context Diffusion: In-Context Aware Image Generation

We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context. Recent work tackles such in-context learning for image generation, where a query image is…

Computer Vision and Pattern Recognition · Computer Science 2025-07-24 Ivona Najdenkoska , Animesh Sinha , Abhimanyu Dubey , Dhruv Mahajan , Vignesh Ramanathan , Filip Radenovic

MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask

Recent advancements in diffusion models have showcased their impressive capacity to generate visually striking images. Nevertheless, ensuring a close match between the generated image and the given prompt remains a persistent challenge. In…

Computer Vision and Pattern Recognition · Computer Science 2023-09-11 Yupeng Zhou , Daquan Zhou , Zuo-Liang Zhu , Yaxing Wang , Qibin Hou , Jiashi Feng

MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation

Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge,…

Computer Vision and Pattern Recognition · Computer Science 2023-02-17 Omer Bar-Tal , Lior Yariv , Yaron Lipman , Tali Dekel

DiffusionInst: Diffusion Model for Instance Segmentation

Diffusion frameworks have achieved comparable performance with previous state-of-the-art image generation models. Researchers are curious about its variants in discriminative tasks because of its powerful noise-to-image denoising pipeline.…

Computer Vision and Pattern Recognition · Computer Science 2022-12-29 Zhangxuan Gu , Haoxing Chen , Zhuoer Xu , Jun Lan , Changhua Meng , Weiqiang Wang

InstanceV: Instance-Level Video Generation

Recent advances in text-to-video diffusion models have enabled the generation of high-quality videos conditioned on textual descriptions. However, most existing text-to-video models rely solely on textual conditions, lacking general…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Yuheng Chen , Teng Hu , Jiangning Zhang , Zhucun Xue , Ran Yi , Lizhuang Ma

Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt

Text-to-image generation has witnessed great progress, especially with the recent advancements in diffusion models. Since texts cannot provide detailed conditions like object appearance, reference images are usually leveraged for the…

Computer Vision and Pattern Recognition · Computer Science 2024-04-09 Zhiqi Huang , Huixin Xiong , Haoyu Wang , Longguang Wang , Zhiheng Li

IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation

While Text-to-Image (T2I) diffusion models excel at generating visually appealing images of individual instances, they struggle to accurately position and control the features generation of multiple instances. The Layout-to-Image (L2I) task…

Computer Vision and Pattern Recognition · Computer Science 2024-11-07 Yinwei Wu , Xianpan Zhou , Bing Ma , Xuefeng Su , Kai Ma , Xinchao Wang

Grounding Text-to-Image Diffusion Models for Controlled High-Quality Image Generation

Text-to-image (T2I) generative diffusion models have demonstrated outstanding performance in synthesizing diverse, high-quality visuals from text captions. Several layout-to-image models have been developed to control the generation process…

Computer Vision and Pattern Recognition · Computer Science 2025-02-11 Ahmad Süleyman , Göksel Biricik

Directed Diffusion: Direct Control of Object Placement through Attention Guidance

Text-guided diffusion models such as DALLE-2, Imagen, eDiff-I, and Stable Diffusion are able to generate an effectively endless variety of images given only a short text prompt describing the desired image content. In many cases the images…

Computer Vision and Pattern Recognition · Computer Science 2023-09-27 Wan-Duo Kurt Ma , J. P. Lewis , Avisek Lahiri , Thomas Leung , W. Bastiaan Kleijn

Your Diffusion Model is Secretly a Zero-Shot Classifier

The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive…

Machine Learning · Computer Science 2023-09-14 Alexander C. Li , Mihir Prabhudesai , Shivam Duggal , Ellis Brown , Deepak Pathak

Customizing Text-to-Image Diffusion with Object Viewpoint Control

Model customization introduces new concepts to existing text-to-image models, enabling the generation of these new concepts/objects in novel contexts. However, such methods lack accurate camera view control with respect to the new object,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-04 Nupur Kumari , Grace Su , Richard Zhang , Taesung Park , Eli Shechtman , Jun-Yan Zhu

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

We present MosaicFusion, a simple yet effective diffusion-based data augmentation approach for large vocabulary instance segmentation. Our method is training-free and does not rely on any label supervision. Two key designs enable us to…

Computer Vision and Pattern Recognition · Computer Science 2024-10-07 Jiahao Xie , Wei Li , Xiangtai Li , Ziwei Liu , Yew Soon Ong , Chen Change Loy

Scribble-Guided Diffusion for Training-free Text-to-Image Generation

Recent advancements in text-to-image diffusion models have demonstrated remarkable success, yet they often struggle to fully capture the user's intent. Existing approaches using textual inputs combined with bounding boxes or region masks…

Computer Vision and Pattern Recognition · Computer Science 2024-09-13 Seonho Lee , Jiho Choi , Seohyun Lim , Jiwook Kim , Hyunjung Shim

Localized Control in Diffusion Models via Latent Vector Prediction

Diffusion models emerged as a leading approach in text-to-image generation, producing high-quality images from textual descriptions. However, attempting to achieve detailed control to get a desired image solely through text remains a…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Pablo Domingo-Gregorio , Javier Ruiz-Hidalgo

Discriminative Class Tokens for Text-to-Image Diffusion Models

Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. While impressive, the images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 Idan Schwartz , Vésteinn Snæbjarnarson , Hila Chefer , Ryan Cotterell , Serge Belongie , Lior Wolf , Sagie Benaim