Related papers: Visual Concepts Tokenization

General Image-to-Image Translation with One-Shot Image Guidance

Large-scale text-to-image models pre-trained on massive text-image pairs show excellent performance in image synthesis recently. However, image can provide more intuitive visual concepts than plain text. People may ask: how can we integrate…

Computer Vision and Pattern Recognition · Computer Science 2023-09-21 Bin Cheng , Zuhao Liu , Yunbo Peng , Yue Lin

A Comprehensive Survey on Visual Concept Mining in Text-to-image Diffusion Models

Text-to-image diffusion models have made significant advancements in generating high-quality, diverse images from text prompts. However, the inherent limitations of textual signals often prevent these models from fully capturing specific…

Computer Vision and Pattern Recognition · Computer Science 2025-03-19 Ziqiang Li , Jun Li , Lizhi Xiong , Zhangjie Fu , Zechao Li

Superpixel Tokenization for Vision Transformers: Preserving Semantic Integrity in Visual Tokens

Transformers, a groundbreaking architecture proposed for Natural Language Processing (NLP), have also achieved remarkable success in Computer Vision. A cornerstone of their success lies in the attention mechanism, which models relationships…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Jaihyun Lew , Soohyuk Jang , Jaehoon Lee , Seungryong Yoo , Eunji Kim , Saehyung Lee , Jisoo Mok , Siwon Kim , Sungroh Yoon

VcT: Visual change Transformer for Remote Sensing Image Change Detection

Existing visual change detectors usually adopt CNNs or Transformers for feature representation learning and focus on learning effective representation for the changed regions between images. Although good performance can be obtained by…

Computer Vision and Pattern Recognition · Computer Science 2023-10-18 Bo Jiang , Zitian Wang , Xixi Wang , Ziyan Zhang , Lan Chen , Xiao Wang , Bin Luo

Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement

Disentangled representation learning strives to extract the intrinsic factors within observed data. Factorizing these representations in an unsupervised manner is notably challenging and usually requires tailored loss functions or specific…

Computer Vision and Pattern Recognition · Computer Science 2024-06-13 Tao Yang , Cuiling Lan , Yan Lu , Nanning zheng

CusConcept: Customized Visual Concept Decomposition with Diffusion Models

Enabling generative models to decompose visual concepts from a single image is a complex and challenging problem. In this paper, we study a new and challenging task, customized concept decomposition, wherein the objective is to leverage…

Computer Vision and Pattern Recognition · Computer Science 2024-10-02 Zhi Xu , Shaozhe Hao , Kai Han

ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition

Machine unlearning in Vision-Language Models (VLMs) is typically performed at the image or instance level, making it difficult to precisely remove target knowledge without affecting unrelated semantics. This issue is especially pronounced…

Computer Vision and Pattern Recognition · Computer Science 2026-05-18 Shen Lin , Jing Lin , Junhao Dong , Piotr Koniusz , Li Xu

OmniPrism: Learning Disentangled Visual Concept for Image Generation

Creative visual concept generation often draws inspiration from specific concepts in a reference image to produce relevant outcomes. However, existing methods are typically constrained to single-aspect concept generation or are easily…

Computer Vision and Pattern Recognition · Computer Science 2026-04-13 Yangyang Li , Daqing Liu , Wu Liu , Allen He , Xinchen Liu , Yongdong Zhang , Guoqing Jin

SegDiscover: Visual Concept Discovery via Unsupervised Semantic Segmentation

Visual concept discovery has long been deemed important to improve interpretability of neural networks, because a bank of semantically meaningful concepts would provide us with a starting point for building machine learning models that…

Computer Vision and Pattern Recognition · Computer Science 2022-04-26 Haiyang Huang , Zhi Chen , Cynthia Rudin

ViConEx-Med: Visual Concept Explainability via Multi-Concept Token Transformer for Medical Image Analysis

Concept-based models aim to explain model decisions with human-understandable concepts. However, most existing approaches treat concepts as numerical attributes, without providing complementary visual explanations that could localize the…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Cristiano Patrício , Luís F. Teixeira , João C. Neves

Learning Concept-Based Causal Transition and Symbolic Reasoning for Visual Planning

Visual planning simulates how humans make decisions to achieve desired goals in the form of searching for visual causal transitions between an initial visual state and a final visual goal state. It has become increasingly important in…

Artificial Intelligence · Computer Science 2024-03-28 Yilue Qian , Peiyu Yu , Ying Nian Wu , Yao Su , Wei Wang , Lifeng Fan

ConceptPrism: Concept Disentanglement in Personalized Diffusion Models via Residual Token Optimization

Personalized text-to-image (T2I) generation has emerged as a key application for creating user-specific concepts from a few reference images. The core challenge is concept disentanglement: separating the target concept from irrelevant…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Minseo Kim , Minchan Kwon , Dongyeun Lee , Yunho Jeon , Junmo Kim

Unsupervised learning of object semantic parts from internal states of CNNs by population encoding

We address the key question of how object part representations can be found from the internal states of CNNs that are trained for high-level tasks, such as object classification. This work provides a new unsupervised method to learn…

Machine Learning · Computer Science 2016-11-15 Jianyu Wang , Zhishuai Zhang , Cihang Xie , Vittal Premachandran , Alan Yuille

Language-Guided Visual Perception Disentanglement for Image Quality Assessment and Conditional Image Generation

Contrastive vision-language models, such as CLIP, have demonstrated excellent zero-shot capability across semantic recognition tasks, mainly attributed to the training on a large-scale I&1T (one Image with one Text) dataset. This kind of…

Computer Vision and Pattern Recognition · Computer Science 2025-03-05 Zhichao Yang , Leida Li , Pengfei Chen , Jinjian Wu , Giuseppe Valenzise

If you can describe it, they can see it: Cross-Modal Learning of Visual Concepts from Textual Descriptions

Humans can visualize new and unknown concepts from their natural language description, based on their experience and previous knowledge. Insipired by this, we present a way to extend this ability to Vision-Language Models (VLMs), teaching…

Computer Vision and Pattern Recognition · Computer Science 2025-12-18 Carlo Alberto Barbano , Luca Molinaro , Massimiliano Ciranni , Emanuele Aiello , Vito Paolo Pastore , Marco Grangetto

CAT: Cross Attention in Vision Transformer

Since Transformer has found widespread use in NLP, the potential of Transformer in CV has been realized and has inspired many new approaches. However, the computation required for replacing word tokens with image patches for Transformer…

Computer Vision and Pattern Recognition · Computer Science 2021-06-11 Hezheng Lin , Xing Cheng , Xiangyu Wu , Fan Yang , Dong Shen , Zhongyuan Wang , Qing Song , Wei Yuan

Per-Query Visual Concept Learning

Visual concept learning, also known as Text-to-image personalization, is the process of teaching new concepts to a pretrained model. This has numerous applications from product placement to entertainment and personalized design. Here we…

Computer Vision and Pattern Recognition · Computer Science 2025-08-13 Ori Malca , Dvir Samuel , Gal Chechik

Language-Informed Visual Concept Learning

Our understanding of the visual world is centered around various concept axes, characterizing different aspects of visual entities. While different concept axes can be easily specified by language, e.g. color, the exact visual nuances along…

Computer Vision and Pattern Recognition · Computer Science 2024-04-04 Sharon Lee , Yunzhi Zhang , Shangzhe Wu , Jiajun Wu

Early Visual Concept Learning with Unsupervised Deep Learning

Automated discovery of early visual concepts from raw image data is a major open challenge in AI research. Addressing this problem, we propose an unsupervised approach for learning disentangled representations of the underlying factors of…

Machine Learning · Statistics 2016-09-21 Irina Higgins , Loic Matthey , Xavier Glorot , Arka Pal , Benigno Uria , Charles Blundell , Shakir Mohamed , Alexander Lerchner

TCIC: Theme Concepts Learning Cross Language and Vision for Image Captioning

Existing research for image captioning usually represents an image using a scene graph with low-level facts (objects and relations) and fails to capture the high-level semantics. In this paper, we propose a Theme Concepts extended Image…

Computer Vision and Pattern Recognition · Computer Science 2021-06-22 Zhihao Fan , Zhongyu Wei , Siyuan Wang , Ruize Wang , Zejun Li , Haijun Shan , Xuanjing Huang