English
Related papers

Related papers: Per-Query Visual Concept Learning

200 papers

Text-to-image diffusion models have made significant advancements in generating high-quality, diverse images from text prompts. However, the inherent limitations of textual signals often prevent these models from fully capturing specific…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Ziqiang Li , Jun Li , Lizhi Xiong , Zhangjie Fu , Zechao Li

Diffusion models (DMs) have become the new trend of generative models and have demonstrated a powerful ability of conditional synthesis. Among those, text-to-image diffusion models pre-trained on large-scale image-text pairs are highly…

Computer Vision and Pattern Recognition · Computer Science 2023-03-06 Wenliang Zhao , Yongming Rao , Zuyan Liu , Benlin Liu , Jie Zhou , Jiwen Lu

Most text-to-image customization techniques fine-tune models on a small set of \emph{personal concept} images captured in minimal contexts. This often results in the model becoming overfitted to these training images and unable to…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Taewook Kim , Wei Chen , Qiang Qiu

Recent large-scale vision-language models (VLMs) have demonstrated remarkable capabilities in understanding and generating textual descriptions for visual content. However, these models lack an understanding of user-specific concepts. In…

Computer Vision and Pattern Recognition · Computer Science 2024-03-22 Yuval Alaluf , Elad Richardson , Sergey Tulyakov , Kfir Aberman , Daniel Cohen-Or

Text-to-image diffusion models have achieved remarkable progress in generating diverse and realistic images from textual descriptions. However, they still struggle with personalization, which requires adapting a pretrained model to depict…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Seoyun Yang , Gihoon Kim , Taesup Kim

This paper explores the possibility of learning custom tokens for representing new concepts in Vision-Language Models (VLMs). Our aim is to learn tokens that can be effective for both discriminative and generative tasks while composing well…

Computer Vision and Pattern Recognition · Computer Science 2025-02-18 Pramuditha Perera , Matthew Trager , Luca Zancato , Alessandro Achille , Stefano Soatto

Subject-driven text-to-image diffusion models empower users to tailor the model to new concepts absent in the pre-training dataset using a few sample images. However, prevalent subject-driven models primarily rely on single-concept input…

Computer Vision and Pattern Recognition · Computer Science 2024-02-16 Junjie Shentu , Matthew Watson , Noura Al Moubayed

While generative models produce high-quality images of concepts learned from a large-scale database, a user often wishes to synthesize instantiations of their own concepts (for example, their family, pets, or items). Can we teach a model to…

Computer Vision and Pattern Recognition · Computer Science 2023-06-21 Nupur Kumari , Bingliang Zhang , Richard Zhang , Eli Shechtman , Jun-Yan Zhu

Text-to-image personalization aims to teach a pre-trained diffusion model to reason about novel, user provided concepts, embedding them into new scenes guided by natural language prompts. However, current personalization approaches struggle…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Rinon Gal , Moab Arar , Yuval Atzmon , Amit H. Bermano , Gal Chechik , Daniel Cohen-Or

Text-to-image diffusion models have shown impressive capabilities in generating realistic visuals from natural-language prompts, yet they often struggle with accurately binding attributes to corresponding objects, especially in prompts…

Computer Vision and Pattern Recognition · Computer Science 2025-05-05 Do Huu Dat , Nam Hyeonu , Po-Yuan Mao , Tae-Hyun Oh

Text-to-image (TTI) diffusion models have demonstrated impressive results in generating high-resolution images of complex and imaginative scenes. Recent approaches have further extended these methods with personalization techniques that…

Computer Vision and Pattern Recognition · Computer Science 2025-05-05 Tanzila Rahman , Shweta Mahajan , Hsin-Ying Lee , Jian Ren , Sergey Tulyakov , Leonid Sigal

Personalized text-to-image generation aims to synthesize images of user-provided concepts in diverse contexts. Despite recent progress in multi-concept personalization, most are limited to object concepts and struggle to customize abstract…

Computer Vision and Pattern Recognition · Computer Science 2026-02-23 Weizhi Zhong , Huan Yang , Zheng Liu , Huiguo He , Zijian He , Xuesong Niu , Di Zhang , Guanbin Li

Humans can visualize new and unknown concepts from their natural language description, based on their experience and previous knowledge. Insipired by this, we present a way to extend this ability to Vision-Language Models (VLMs), teaching…

Computer Vision and Pattern Recognition · Computer Science 2025-12-18 Carlo Alberto Barbano , Luca Molinaro , Massimiliano Ciranni , Emanuele Aiello , Vito Paolo Pastore , Marco Grangetto

Personalized text-to-image generation has attracted unprecedented attention in the recent few years due to its unique capability of generating highly-personalized images via using the input concept dataset and novel textual prompt. However,…

Artificial Intelligence · Computer Science 2024-07-02 Shian Du , Xiaotian Cheng , Qi Qian , Henglu Wei , Yi Xu , Xiangyang Ji

Recent advances in text-to-image diffusion models have substantially improved the quality of image customization, enabling the synthesis of highly realistic images. Despite this progress, achieving fast and efficient personalization remains…

Computer Vision and Pattern Recognition · Computer Science 2026-03-25 Aniket Roy , Maitreya Suin , Rama Chellappa

Our understanding of the visual world is centered around various concept axes, characterizing different aspects of visual entities. While different concept axes can be easily specified by language, e.g. color, the exact visual nuances along…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Sharon Lee , Yunzhi Zhang , Shangzhe Wu , Jiajun Wu

Obtaining the human-like perception ability of abstracting visual concepts from concrete pixels has always been a fundamental and important target in machine learning research fields such as disentangled representation learning and scene…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Tao Yang , Yuwang Wang , Yan Lu , Nanning Zheng

Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP, establish the correlation between texts and images, achieving remarkable success on various downstream tasks with fine-tuning. In existing fine-tuning methods, the…

Computer Vision and Pattern Recognition · Computer Science 2023-07-31 Yi Zhang , Ce Zhang , Yushun Tang , Zhihai He

While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing…

Computer Vision and Pattern Recognition · Computer Science 2024-04-08 Gihyun Kwon , Simon Jenni , Dingzeyu Li , Joon-Young Lee , Jong Chul Ye , Fabian Caba Heilbron

Current vision-language models (VLMs) show exceptional abilities across diverse tasks, such as visual question answering. To enhance user experience, recent studies investigate VLM personalization to understand user-provided concepts.…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Ruichuan An , Sihan Yang , Ming Lu , Renrui Zhang , Kai Zeng , Yulin Luo , Jiajun Cao , Hao Liang , Ying Chen , Qi She , Shanghang Zhang , Wentao Zhang
‹ Prev 1 2 3 10 Next ›