Related papers: Per-Query Visual Concept Learning

A Comprehensive Survey on Visual Concept Mining in Text-to-image Diffusion Models

Text-to-image diffusion models have made significant advancements in generating high-quality, diverse images from text prompts. However, the inherent limitations of textual signals often prevent these models from fully capturing specific…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Ziqiang Li , Jun Li , Lizhi Xiong , Zhangjie Fu , Zechao Li

Unleashing Text-to-Image Diffusion Models for Visual Perception

Diffusion models (DMs) have become the new trend of generative models and have demonstrated a powerful ability of conditional synthesis. Among those, text-to-image diffusion models pre-trained on large-scale image-text pairs are highly…

Computer Vision and Pattern Recognition · Computer Science 2023-03-06 Wenliang Zhao , Yongming Rao , Zuyan Liu , Benlin Liu , Jie Zhou , Jiwen Lu

Learning to Customize Text-to-Image Diffusion In Diverse Context

Most text-to-image customization techniques fine-tune models on a small set of \emph{personal concept} images captured in minimal contexts. This often results in the model becoming overfitted to these training images and unable to…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Taewook Kim , Wei Chen , Qiang Qiu

MyVLM: Personalizing VLMs for User-Specific Queries

Recent large-scale vision-language models (VLMs) have demonstrated remarkable capabilities in understanding and generating textual descriptions for visual content. However, these models lack an understanding of user-specific concepts. In…

Computer Vision and Pattern Recognition · Computer Science 2024-03-22 Yuval Alaluf , Elad Richardson , Sergey Tulyakov , Kfir Aberman , Daniel Cohen-Or

Semantic Anchoring for Robust Personalization in Text-to-Image Diffusion Models

Text-to-image diffusion models have achieved remarkable progress in generating diverse and realistic images from textual descriptions. However, they still struggle with personalization, which requires adapting a pretrained model to depict…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Seoyun Yang , Gihoon Kim , Taesup Kim

Descriminative-Generative Custom Tokens for Vision-Language Models

This paper explores the possibility of learning custom tokens for representing new concepts in Vision-Language Models (VLMs). Our aim is to learn tokens that can be effective for both discriminative and generative tasks while composing well…

Computer Vision and Pattern Recognition · Computer Science 2025-02-18 Pramuditha Perera , Matthew Trager , Luca Zancato , Alessandro Achille , Stefano Soatto

Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation

Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input…

Computer Vision and Pattern Recognition · Computer Science 2024-02-16 Junjie Shentu , Matthew Watson , Noura Al Moubayed

Multi-Concept Customization of Text-to-Image Diffusion

While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Nupur Kumari , Bingliang Zhang , Richard Zhang , Eli Shechtman , Jun-Yan Zhu

Encoder-based Domain Tuning for Fast Personalization of Text-to-Image Models

Text-to-image personalization aims to teach a pre-trained diffusion model to reason about novel, user provided concepts, embedding them into new scenes guided by natural language prompts. However, current personalization approaches struggle…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Rinon Gal , Moab Arar , Yuval Atzmon , Amit H. Bermano , Gal Chechik , Daniel Cohen-Or

VSC: Visual Search Compositional Text-to-Image Diffusion Model

Text-to-image diffusion models have shown impressive capabilities in generating realistic visuals from natural-language prompts, yet they often struggle with accurately binding attributes to corresponding objects, especially in prompts…

Computer Vision and Pattern Recognition · Computer Science 2025-05-05 Do Huu Dat , Nam Hyeonu , Po-Yuan Mao , Tae-Hyun Oh

Visual Concept-driven Image Generation with Text-to-Image Diffusion Model

Text-to-image (TTI) diffusion models have demonstrated impressive results in generating high-resolution images of complex and imaginative scenes. Recent approaches have further extended these methods with personalization techniques that…

Computer Vision and Pattern Recognition · Computer Science 2025-05-05 Tanzila Rahman , Shweta Mahajan , Hsin-Ying Lee , Jian Ren , Sergey Tulyakov , Leonid Sigal

Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter

Personalized text-to-image generation aims to synthesize images of user-provided concepts in diverse contexts. Despite recent progress in multi-concept personalization, most are limited to object concepts and struggle to customize abstract…

Computer Vision and Pattern Recognition · Computer Science 2026-02-23 Weizhi Zhong , Huan Yang , Zheng Liu , Huiguo He , Zijian He , Xuesong Niu , Di Zhang , Guanbin Li

If you can describe it, they can see it: Cross-Modal Learning of Visual Concepts from Textual Descriptions

Humans can visualize new and unknown concepts from their natural language description, based on their experience and previous knowledge. Insipired by this, we present a way to extend this ability to Vision-Language Models (VLMs), teaching…

Computer Vision and Pattern Recognition · Computer Science 2025-12-18 Carlo Alberto Barbano , Luca Molinaro , Massimiliano Ciranni , Emanuele Aiello , Vito Paolo Pastore , Marco Grangetto

Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace

Personalized text-to-image generation has attracted unprecedented attention in the recent few years due to its unique capability of generating highly-personalized images via using the input concept dataset and novel textual prompt. However,…

Artificial Intelligence · Computer Science 2024-07-02 Shian Du , Xiaotian Cheng , Qi Qian , Henglu Wei , Yi Xu , Xiangyang Ji

Zero-Shot Personalization of Objects via Textual Inversion

Recent advances in text-to-image diffusion models have substantially improved the quality of image customization, enabling the synthesis of highly realistic images. Despite this progress, achieving fast and efficient personalization remains…

Computer Vision and Pattern Recognition · Computer Science 2026-03-25 Aniket Roy , Maitreya Suin , Rama Chellappa

Language-Informed Visual Concept Learning

Our understanding of the visual world is centered around various concept axes, characterizing different aspects of visual entities. While different concept axes can be easily specified by language, e.g. color, the exact visual nuances along…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Sharon Lee , Yunzhi Zhang , Shangzhe Wu , Jiajun Wu

Visual Concepts Tokenization

Obtaining the human-like perception ability of abstracting visual concepts from concrete pixels has always been a fundamental and important target in machine learning research fields such as disentangled representation learning and scene…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Tao Yang , Yuwang Wang , Yan Lu , Nanning Zheng

Cross-Modal Concept Learning and Inference for Vision-Language Models

Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP, establish the correlation between texts and images, achieving remarkable success on various downstream tasks with fine-tuning. In existing fine-tuning methods, the…

Computer Vision and Pattern Recognition · Computer Science 2023-07-31 Yi Zhang , Ce Zhang , Yushun Tang , Zhihai He

Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing…

Computer Vision and Pattern Recognition · Computer Science 2024-04-08 Gihyun Kwon , Simon Jenni , Dingzeyu Li , Joon-Young Lee , Jong Chul Ye , Fabian Caba Heilbron

MC-LLaVA: Multi-Concept Personalized Vision-Language Model

Current vision-language models (VLMs) show exceptional abilities across diverse tasks, such as visual question answering. To enhance user experience, recent studies investigate VLM personalization to understand user-provided concepts.…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Ruichuan An , Sihan Yang , Ming Lu , Renrui Zhang , Kai Zeng , Yulin Luo , Jiajun Cao , Hao Liang , Ying Chen , Qi She , Shanghang Zhang , Wentao Zhang