Related papers: Visual Concept-Metaconcept Learning

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision

We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, our model learns by simply looking at images and…

Computer Vision and Pattern Recognition · Computer Science 2019-04-30 Jiayuan Mao , Chuang Gan , Pushmeet Kohli , Joshua B. Tenenbaum , Jiajun Wu

If you can describe it, they can see it: Cross-Modal Learning of Visual Concepts from Textual Descriptions

Humans can visualize new and unknown concepts from their natural language description, based on their experience and previous knowledge. Insipired by this, we present a way to extend this ability to Vision-Language Models (VLMs), teaching…

Computer Vision and Pattern Recognition · Computer Science 2025-12-18 Carlo Alberto Barbano , Luca Molinaro , Massimiliano Ciranni , Emanuele Aiello , Vito Paolo Pastore , Marco Grangetto

VLM's Eye Examination: Instruct and Inspect Visual Competency of Vision Language Models

Vision language models (VLMs) have shown promising reasoning capabilities across various benchmarks; however, our understanding of their visual perception remains limited. In this work, we propose an eye examination process to investigate…

Computer Vision and Pattern Recognition · Computer Science 2024-09-24 Nam Hyeon-Woo , Moon Ye-Bin , Wonseok Choi , Lee Hyun , Tae-Hyun Oh

Improving Deep Metric Learning by Divide and Conquer

Deep metric learning (DML) is a cornerstone of many computer vision applications. It aims at learning a mapping from the input domain to an embedding space, where semantically similar objects are located nearby and dissimilar objects far…

Computer Vision and Pattern Recognition · Computer Science 2021-09-10 Artsiom Sanakoyeu , Pingchuan Ma , Vadim Tschernezki , Björn Ommer

Visual Commonsense in Pretrained Unimodal and Multimodal Models

Our commonsense knowledge about objects includes their typical visual attributes; we know that bananas are typically yellow or green, and not purple. Text and image corpora, being subject to reporting bias, represent this world-knowledge to…

Computation and Language · Computer Science 2022-05-05 Chenyu Zhang , Benjamin Van Durme , Zhuowan Li , Elias Stengel-Eskin

FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations

We present a meta-learning framework for learning new visual concepts quickly, from just one or a few examples, guided by multiple naturally occurring data streams: simultaneously looking at images, reading sentences that describe the…

Computer Vision and Pattern Recognition · Computer Science 2022-04-01 Lingjie Mei , Jiayuan Mao , Ziqi Wang , Chuang Gan , Joshua B. Tenenbaum

Visual Relationship Detection with Visual-Linguistic Knowledge from Multimodal Representations

Visual relationship detection aims to reason over relationships among salient objects in images, which has drawn increasing attention over the past few years. Inspired by human reasoning mechanisms, it is believed that external visual…

Computer Vision and Pattern Recognition · Computer Science 2021-04-06 Meng-Jiun Chiou , Roger Zimmermann , Jiashi Feng

Multi-modal Large Language Model Enhanced Pseudo 3D Perception Framework for Visual Commonsense Reasoning

The visual commonsense reasoning (VCR) task is to choose an answer and provide a justifying rationale based on the given image and textural question. Representative works first recognize objects in images and then associate them with key…

Computer Vision and Pattern Recognition · Computer Science 2023-12-27 Jian Zhu , Hanli Wang , Miaojing Shi

Visual Superordinate Abstraction for Robust Concept Learning

Concept learning constructs visual representations that are connected to linguistic semantics, which is fundamental to vision-language tasks. Although promising progress has been made, existing concept learners are still vulnerable to…

Computer Vision and Pattern Recognition · Computer Science 2024-04-01 Qi Zheng , Chaoyue Wang , Dadong Wang , Dacheng Tao

Pre-trained Vision-Language Models Learn Discoverable Visual Concepts

Do vision-language models (VLMs) pre-trained to caption an image of a "durian" learn visual concepts such as "brown" (color) and "spiky" (texture) at the same time? We aim to answer this question as visual concepts learned "for free" would…

Computer Vision and Pattern Recognition · Computer Science 2025-01-15 Yuan Zang , Tian Yun , Hao Tan , Trung Bui , Chen Sun

Vision language models have difficulty recognizing virtual objects

Vision language models (VLMs) are AI systems paired with both language and vision encoders to process multimodal input. They are capable of performing complex semantic tasks such as automatic captioning, but it remains an open question…

Computer Vision and Pattern Recognition · Computer Science 2025-05-16 Tyler Tran , Sangeet Khemlani , J. G. Trafton

Towards Visual Semantics

Lexical Semantics is concerned with how words encode mental representations of the world, i.e., concepts . We call this type of concepts, classification concepts . In this paper, we focus on Visual Semantics , namely on how humans build…

Artificial Intelligence · Computer Science 2021-09-15 Fausto Giunchiglia , Luca Erculiani , Andrea Passerini

Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models

Understanding what deep network models capture in their learned representations is a fundamental challenge in computer vision. We present a new methodology to understanding such vision models, the Visual Concept Connectome (VCC), which…

Computer Vision and Pattern Recognition · Computer Science 2024-04-11 Matthew Kowal , Richard P. Wildes , Konstantinos G. Derpanis

Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models

Vision-Language Models (VLMs) have demonstrated remarkable performance across a variety of real-world tasks. However, existing VLMs typically process visual information by serializing images, a method that diverges significantly from the…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Yueyan Li , Chenggong Zhao , Zeyuan Zang , Caixia Yuan , Xiaojie Wang

Evaluating Model Perception of Color Illusions in Photorealistic Scenes

We study the perception of color illusions by vision-language models. Color illusion, where a person's visual system perceives color differently from actual color, is well-studied in human vision. However, it remains underexplored whether…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Lingjun Mao , Zineng Tang , Alane Suhr

Learning warps object representations in the ventral temporal cortex

The human ventral temporal cortex (VTC) plays a critical role in object recognition. Although it is well established that visual experience shapes VTC object representations, the impact of semantic and contextual learning is unclear. In…

Neurons and Cognition · Quantitative Biology 2016-04-04 Alex Clarke , Philip J. Pell , Charan Ranganath , Lorraine K. Tyler

ViConEx-Med: Visual Concept Explainability via Multi-Concept Token Transformer for Medical Image Analysis

Concept-based models aim to explain model decisions with human-understandable concepts. However, most existing approaches treat concepts as numerical attributes, without providing complementary visual explanations that could localize the…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Cristiano Patrício , Luís F. Teixeira , João C. Neves

Conceptual Learning via Embedding Approximations for Reinforcing Interpretability and Transparency

Concept bottleneck models (CBMs) have emerged as critical tools in domains where interpretability is paramount. These models rely on predefined textual descriptions, referred to as concepts, to inform their decision-making process and offer…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Maor Dikter , Tsachi Blau , Chaim Baskin

WinoViz: Probing Visual Properties of Objects Under Different States

Humans perceive and comprehend different visual properties of an object based on specific contexts. For instance, we know that a banana turns brown ``when it becomes rotten,'' whereas it appears green ``when it is unripe.'' Previous studies…

Computation and Language · Computer Science 2024-02-22 Woojeong Jin , Tejas Srinivasan , Jesse Thomason , Xiang Ren

Cross-Modal Concept Learning and Inference for Vision-Language Models

Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP, establish the correlation between texts and images, achieving remarkable success on various downstream tasks with fine-tuning. In existing fine-tuning methods, the…

Computer Vision and Pattern Recognition · Computer Science 2023-07-31 Yi Zhang , Ce Zhang , Yushun Tang , Zhihai He