Related papers: Localizing Object-level Shape Variations with Text…

Generate Anything Anywhere in Any Scene

Text-to-image diffusion models have attracted considerable interest due to their wide applicability across diverse fields. However, challenges persist in creating controllable models for personalized object generation. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2023-06-30 Yuheng Li , Haotian Liu , Yangming Wen , Yong Jae Lee

Object-level Visual Prompts for Compositional Image Generation

We introduce a method for composing object-level visual prompts within a text-to-image diffusion model. Our approach addresses the task of generating semantically coherent compositions across diverse scenes and styles, similar to the…

Computer Vision and Pattern Recognition · Computer Science 2025-01-03 Gaurav Parmar , Or Patashnik , Kuan-Chieh Wang , Daniil Ostashev , Srinivasa Narasimhan , Jun-Yan Zhu , Daniel Cohen-Or , Kfir Aberman

From Text to Mask: Localizing Entities Using the Attention of Text-to-Image Diffusion Models

Diffusion models have revolted the field of text-to-image generation recently. The unique way of fusing text and image information contributes to their remarkable capability of generating highly text-related images. From another…

Computer Vision and Pattern Recognition · Computer Science 2024-10-02 Changming Xiao , Qi Yang , Feng Zhou , Changshui Zhang

LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts

Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts describing complex scenes with multiple objects. While excelling in…

Computer Vision and Pattern Recognition · Computer Science 2024-02-27 Hanan Gani , Shariq Farooq Bhat , Muzammal Naseer , Salman Khan , Peter Wonka

Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation

Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input…

Computer Vision and Pattern Recognition · Computer Science 2024-02-16 Junjie Shentu , Matthew Watson , Noura Al Moubayed

Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Despite the unprecedented success of text-to-image diffusion models, controlling the number of depicted objects using text is surprisingly hard. This is important for various applications from technical documents, to children's books to…

Computer Vision and Pattern Recognition · Computer Science 2024-06-17 Lital Binyamin , Yoad Tewel , Hilit Segev , Eran Hirsch , Royi Rassin , Gal Chechik

Prompt Expansion for Adaptive Text-to-Image Generation

Text-to-image generation models are powerful but difficult to use. Users craft specific prompts to get better images, though the images can be repetitive. This paper proposes a Prompt Expansion framework that helps users generate…

Computer Vision and Pattern Recognition · Computer Science 2023-12-29 Siddhartha Datta , Alexander Ku , Deepak Ramachandran , Peter Anderson

Local Conditional Controlling for Text-to-Image Diffusion Models

Diffusion models have exhibited impressive prowess in the text-to-image task. Recent methods add image-level structure controls, e.g., edge and depth maps, to manipulate the generation process together with text prompts to obtain desired…

Computer Vision and Pattern Recognition · Computer Science 2024-08-23 Yibo Zhao , Liang Peng , Yang Yang , Zekai Luo , Hengjia Li , Yao Chen , Zheng Yang , Xiaofei He , Wei Zhao , qinglin lu , Boxi Wu , Wei Liu

Localized Control in Diffusion Models via Latent Vector Prediction

Diffusion models emerged as a leading approach in text-to-image generation, producing high-quality images from textual descriptions. However, attempting to achieve detailed control to get a desired image solely through text remains a…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Pablo Domingo-Gregorio , Javier Ruiz-Hidalgo

Leveraging Text-to-Image Diffusion Models for Unsupervised Visual Object Tracking

Unsupervised visual object tracking is a challenging task that requires following arbitrary targets in videos without training on ground-truth annotations. Despite considerable progress, existing state-of-the-art unsupervised trackers often…

Computer Vision and Pattern Recognition · Computer Science 2026-05-27 Zhengbo Zhang , Zhigang Tu , Junsong Yuan , De Wen Soh , Bo Du

LocInv: Localization-aware Inversion for Text-Guided Image Editing

Large-scale Text-to-Image (T2I) diffusion models demonstrate significant generation capabilities based on textual prompts. Based on the T2I diffusion models, text-guided image editing research aims to empower users to manipulate generated…

Computer Vision and Pattern Recognition · Computer Science 2024-05-03 Chuanming Tang , Kai Wang , Fei Yang , Joost van de Weijer

Customizing Text-to-Image Diffusion with Object Viewpoint Control

Model customization introduces new concepts to existing text-to-image models, enabling the generation of these new concepts/objects in novel contexts. However, such methods lack accurate camera view control with respect to the new object,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-04 Nupur Kumari , Grace Su , Richard Zhang , Taesung Park , Eli Shechtman , Jun-Yan Zhu

Training-Free Location-Aware Text-to-Image Synthesis

Current large-scale generative models have impressive efficiency in generating high-quality images based on text prompts. However, they lack the ability to precisely control the size and position of objects in the generated image. In this…

Computer Vision and Pattern Recognition · Computer Science 2023-04-27 Jiafeng Mao , Xueting Wang

GLoD: Composing Global Contexts and Local Details in Image Generation

Diffusion models have demonstrated their capability to synthesize high-quality and diverse images from textual prompts. However, simultaneous control over both global contexts (e.g., object layouts and interactions) and local details (e.g.,…

Computer Vision and Pattern Recognition · Computer Science 2024-04-25 Moyuru Yamada

Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis

Diffusion-based models have achieved state-of-the-art performance on text-to-image synthesis tasks. However, one critical limitation of these models is the low fidelity of generated images with respect to the text description, such as…

Computer Vision and Pattern Recognition · Computer Science 2023-04-11 Qiucheng Wu , Yujian Liu , Handong Zhao , Trung Bui , Zhe Lin , Yang Zhang , Shiyu Chang

DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models

Text-to-image generation models that generate images based on prompt descriptions have attracted an increasing amount of attention during the past few months. Despite their encouraging performance, these models raise concerns about the…

Cryptography and Security · Computer Science 2023-01-10 Zeyang Sha , Zheng Li , Ning Yu , Yang Zhang

Sketch-Guided Scene Image Generation

Text-to-image models are showcasing the impressive ability to create high-quality and diverse generative images. Nevertheless, the transition from freehand sketches to complex scene images remains challenging using diffusion models. In this…

Computer Vision and Pattern Recognition · Computer Science 2024-07-10 Tianyu Zhang , Xiaoxuan Xie , Xusheng Du , Haoran Xie

VSC: Visual Search Compositional Text-to-Image Diffusion Model

Text-to-image diffusion models have shown impressive capabilities in generating realistic visuals from natural-language prompts, yet they often struggle with accurately binding attributes to corresponding objects, especially in prompts…

Computer Vision and Pattern Recognition · Computer Science 2025-05-05 Do Huu Dat , Nam Hyeonu , Po-Yuan Mao , Tae-Hyun Oh

ObjectComposer: Consistent Generation of Multiple Objects Without Fine-tuning

Recent text-to-image generative models can generate high-fidelity images from text prompts. However, these models struggle to consistently generate the same objects in different contexts with the same appearance. Consistent object…

Computer Vision and Pattern Recognition · Computer Science 2023-10-12 Alec Helbling , Evan Montoya , Duen Horng Chau

Learning to Customize Text-to-Image Diffusion In Diverse Context

Most text-to-image customization techniques fine-tune models on a small set of \emph{personal concept} images captured in minimal contexts. This often results in the model becoming overfitted to these training images and unable to…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Taewook Kim , Wei Chen , Qiang Qiu