Related papers: Compositional Inversion for Stable Diffusion Model…

Inversion-Free Style Transfer with Dual Rectified Flows

Style transfer, a pivotal task in image processing, synthesizes visually compelling images by seamlessly blending realistic content with artistic styles, enabling applications in photo editing and creative design. While mainstream…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Yingying Deng , Xiangyu He , Fan Tang , Weiming Dong , Xucheng Yin

Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models

Text-to-image generative models have enabled high-resolution image synthesis across different domains, but require users to specify the content they wish to generate. In this paper, we consider the inverse problem -- given a collection of…

Computer Vision and Pattern Recognition · Computer Science 2023-08-04 Nan Liu , Yilun Du , Shuang Li , Joshua B. Tenenbaum , Antonio Torralba

CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization

We propose CatVersion, an inversion-based method that learns the personalized concept through a handful of examples. Subsequently, users can utilize text prompts to generate images that embody the personalized concept, thereby achieving…

Computer Vision and Pattern Recognition · Computer Science 2023-12-01 Ruoyu Zhao , Mingrui Zhu , Shiyin Dong , Nannan Wang , Xinbo Gao

Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else

Recent advances in text-to-image diffusion models have enabled the photorealistic generation of images from text prompts. Despite the great progress, existing models still struggle to generate compositional multi-concept images naturally,…

Computer Vision and Pattern Recognition · Computer Science 2023-10-12 Hazarapet Tunanyan , Dejia Xu , Shant Navasardyan , Zhangyang Wang , Humphrey Shi

EDITOR: Effective and Interpretable Prompt Inversion for Text-to-Image Diffusion Models

Text-to-image generation models~(e.g., Stable Diffusion) have achieved significant advancements, enabling the creation of high-quality and realistic images based on textual descriptions. Prompt inversion, the task of identifying the textual…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Mingzhe Li , Kejing Xia , Gehao Zhang , Zhenting Wang , Guanhong Tao , Siqi Pan , Juan Zhai , Shiqing Ma

Investigating Conceptual Blending of a Diffusion Model for Improving Nonword-to-Image Generation

Text-to-image diffusion models sometimes depict blended concepts in the generated images. One promising use case of this effect would be the nonword-to-image generation task which attempts to generate images intuitively imaginable from a…

Multimedia · Computer Science 2024-11-07 Chihaya Matsuhira , Marc A. Kastner , Takahiro Komamizu , Takatsugu Hirayama , Ichiro Ide

Comparison Reveals Commonality: Customized Image Generation through Contrastive Inversion

The recent demand for customized image generation raises a need for techniques that effectively extract the common concept from small sets of images. Existing methods typically rely on additional guidance, such as text prompts or spatial…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Minseo Kim , Minchan Kwon , Dongyeun Lee , Yunho Jeon , Junmo Kim

Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting

Text-to-image (T2I) customization aims to create images that embody specific visual concepts delineated in textual descriptions. However, existing works still face a main challenge, concept overfitting. To tackle this challenge, we first…

Computer Vision and Pattern Recognition · Computer Science 2024-04-23 Weili Zeng , Yichao Yan , Qi Zhu , Zhuo Chen , Pengzhi Chu , Weiming Zhao , Xiaokang Yang

StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing

A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.They either finetune the model, or invert the image in the latent space of the pretrained model. However,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-09 Senmao Li , Joost van de Weijer , Taihang Hu , Fahad Shahbaz Khan , Qibin Hou , Yaxing Wang , Jian Yang , Ming-Ming Cheng

VSC: Visual Search Compositional Text-to-Image Diffusion Model

Text-to-image diffusion models have shown impressive capabilities in generating realistic visuals from natural-language prompts, yet they often struggle with accurately binding attributes to corresponding objects, especially in prompts…

Computer Vision and Pattern Recognition · Computer Science 2025-05-05 Do Huu Dat , Nam Hyeonu , Po-Yuan Mao , Tae-Hyun Oh

Multi-Concept Customization of Text-to-Image Diffusion

While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Nupur Kumari , Bingliang Zhang , Richard Zhang , Eli Shechtman , Jun-Yan Zhu

Orthogonal Adaptation for Modular Customization of Diffusion Models

Customization techniques for text-to-image models have paved the way for a wide range of previously unattainable applications, enabling the generation of specific concepts across diverse contexts and styles. While existing methods…

Computer Vision and Pattern Recognition · Computer Science 2024-12-06 Ryan Po , Guandao Yang , Kfir Aberman , Gordon Wetzstein

Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion

Diffusion models have shown superior performance in image generation and manipulation, but the inherent stochasticity presents challenges in preserving and manipulating image content and identity. While previous approaches like DreamBooth…

Computer Vision and Pattern Recognition · Computer Science 2023-04-20 Inhwa Han , Serin Yang , Taesung Kwon , Jong Chul Ye

Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method

We explore the oscillatory behavior observed in inversion methods applied to large-scale text-to-image diffusion models, with a focus on the "Flux" model. By employing a fixed-point-inspired iterative approach to invert real-world images,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-19 Yan Zheng , Zhenxiao Liang , Xiaoyan Cong , Lanqing guo , Yuehao Wang , Peihao Wang , Zhangyang Wang

Taking a Deeper Look at the Inverse Compositional Algorithm

In this paper, we provide a modern synthesis of the classic inverse compositional algorithm for dense image alignment. We first discuss the assumptions made by this well-established technique, and subsequently propose to relax these…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Zhaoyang Lv , Frank Dellaert , James M. Rehg , Andreas Geiger

Reverse Stable Diffusion: What prompt was used to generate this image?

Text-to-image diffusion models have recently attracted the interest of many researchers, and inverting the diffusion process can play an important role in better understanding the generative process and how to engineer prompts in order to…

Computer Vision and Pattern Recognition · Computer Science 2024-10-22 Florinel-Alin Croitoru , Vlad Hondru , Radu Tudor Ionescu , Mubarak Shah

Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation

Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input…

Computer Vision and Pattern Recognition · Computer Science 2024-02-16 Junjie Shentu , Matthew Watson , Noura Al Moubayed

Null-text Inversion for Editing Real Images using Guided Diffusion Models

Recent text-guided diffusion models provide powerful image generation capabilities. Currently, a massive effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. To edit a…

Computer Vision and Pattern Recognition · Computer Science 2022-11-18 Ron Mokady , Amir Hertz , Kfir Aberman , Yael Pritch , Daniel Cohen-Or

Cross-domain Compositing with Pretrained Diffusion Models

Diffusion models have enabled high-quality, conditional image editing capabilities. We propose to expand their arsenal, and demonstrate that off-the-shelf diffusion models can be used for a wide range of cross-domain compositing tasks.…

Computer Vision and Pattern Recognition · Computer Science 2023-05-26 Roy Hachnochi , Mingrui Zhao , Nadav Orzech , Rinon Gal , Ali Mahdavi-Amiri , Daniel Cohen-Or , Amit Haim Bermano

Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models

Text-driven diffusion models have significantly advanced the image editing performance by using text prompts as inputs. One crucial step in text-driven image editing is to invert the original image into a latent noise code conditioned on…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Ruibin Li , Ruihuang Li , Song Guo , Lei Zhang