Related papers: Concept Generalization in Visual Representation Le…

Modeling Multiple Views via Implicitly Preserving Global Consistency and Local Complementarity

While self-supervised learning techniques are often used to mining implicit knowledge from unlabeled data via modeling multiple views, it is unclear how to perform effective representation learning in a complex and inconsistent context. To…

Computer Vision and Pattern Recognition · Computer Science 2023-08-10 Jiangmeng Li , Wenwen Qiang , Changwen Zheng , Bing Su , Farid Razzak , Ji-Rong Wen , Hui Xiong

Perceptual Group Tokenizer: Building Perception with Iterative Grouping

Human visual recognition system shows astonishing capability of compressing visual information into a set of tokens containing rich representations without label supervision. One critical driving principle behind it is perceptual grouping.…

Computer Vision and Pattern Recognition · Computer Science 2024-01-26 Zhiwei Deng , Ting Chen , Yang Li

Compositional Generalization in Image Captioning

Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a…

Machine Learning · Computer Science 2019-11-12 Mitja Nikolaus , Mostafa Abdou , Matthew Lamm , Rahul Aralikatte , Desmond Elliott

Visual Concepts Tokenization

Obtaining the human-like perception ability of abstracting visual concepts from concrete pixels has always been a fundamental and important target in machine learning research fields such as disentangled representation learning and scene…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Tao Yang , Yuwang Wang , Yan Lu , Nanning Zheng

V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices

One of the primary challenges faced by deep learning is the degree to which current methods exploit superficial statistics and dataset bias, rather than learning to generalise over the specific representations they have experienced. This is…

Computer Vision and Pattern Recognition · Computer Science 2019-07-30 Damien Teney , Peng Wang , Jiewei Cao , Lingqiao Liu , Chunhua Shen , Anton van den Hengel

UniTok: A Unified Tokenizer for Visual Generation and Understanding

Visual generative and understanding models typically rely on distinct tokenizers to process images, presenting a key challenge for unifying them within a single framework. Recent studies attempt to address this by connecting the training of…

Computer Vision and Pattern Recognition · Computer Science 2025-10-27 Chuofan Ma , Yi Jiang , Junfeng Wu , Jihan Yang , Xin Yu , Zehuan Yuan , Bingyue Peng , Xiaojuan Qi

Visual-Semantic Embedding Model Informed by Structured Knowledge

We propose a novel approach to improve a visual-semantic embedding model by incorporating concept representations captured from an external structured knowledge base. We investigate its performance on image classification under both…

Computer Vision and Pattern Recognition · Computer Science 2020-09-22 Mirantha Jayathilaka , Tingting Mu , Uli Sattler

Exploring Hierarchical Graph Representation for Large-Scale Zero-Shot Image Classification

The main question we address in this paper is how to scale up visual recognition of unseen classes, also known as zero-shot learning, to tens of thousands of categories as in the ImageNet-21K benchmark. At this scale, especially with many…

Computer Vision and Pattern Recognition · Computer Science 2022-07-20 Kai Yi , Xiaoqian Shen , Yunhao Gou , Mohamed Elhoseiny

Learning an Adaptation Function to Assess Image Visual Similarities

Human perception is routinely assessing the similarity between images, both for decision making and creative thinking. But the underlying cognitive process is not really well understood yet, hence difficult to be mimicked by computer vision…

Computer Vision and Pattern Recognition · Computer Science 2022-06-06 Olivier Risser-Maroix , Amine Marzouki , Hala Djeghim , Camille Kurtz , Nicolas Lomenie

Concept Learners for Few-Shot Learning

Developing algorithms that are able to generalize to a novel task given only a few labeled examples represents a fundamental challenge in closing the gap between machine- and human-level performance. The core of human cognition lies in the…

Machine Learning · Computer Science 2021-03-23 Kaidi Cao , Maria Brbic , Jure Leskovec

Representation Based Complexity Measures for Predicting Generalization in Deep Learning

Deep Neural Networks can generalize despite being significantly overparametrized. Recent research has tried to examine this phenomenon from various view points and to provide bounds on the generalization error or measures predictive of the…

Machine Learning · Computer Science 2020-12-07 Parth Natekar , Manik Sharma

Towards Modality Generalization: A Benchmark and Prospective Analysis

Multi-modal learning has achieved remarkable success by integrating information from various modalities, achieving superior performance in tasks like recognition and retrieval compared to uni-modal approaches. However, real-world scenarios…

Computer Vision and Pattern Recognition · Computer Science 2025-08-05 Xiaohao Liu , Xiaobo Xia , Zhuo Huang , See-Kiong Ng , Tat-Seng Chua

Multimodal Contrastive Training for Visual Representation Learning

We develop an approach to learning visual representations that embraces multimodal data, driven by a combination of intra- and inter-modal similarity preservation objectives. Unlike existing visual pre-training methods, which solve a proxy…

Computer Vision and Pattern Recognition · Computer Science 2021-04-28 Xin Yuan , Zhe Lin , Jason Kuen , Jianming Zhang , Yilin Wang , Michael Maire , Ajinkya Kale , Baldo Faieta

Generalizable Imitation Learning Through Pre-Trained Representations

In this paper, we leverage self-supervised vision transformer models and their emergent semantic abilities to improve the generalization abilities of imitation learning policies. We introduce DVK, an imitation learning algorithm that…

Robotics · Computer Science 2025-03-12 Wei-Di Chang , Francois Hogan , Scott Fujimoto , David Meger , Gregory Dudek

Extracting Visual Knowledge from the Internet: Making Sense of Image Data

Recent successes in visual recognition can be primarily attributed to feature representation, learning algorithms, and the ever-increasing size of labeled training data. Extensive research has been devoted to the first two, but much less…

Computer Vision and Pattern Recognition · Computer Science 2019-06-10 Yazhou Yao , Jian Zhang , Xiansheng Hua , Fumin Shen , Zhenmin Tang

On the Performance of Concept Probing: The Influence of the Data (Extended Version)

Concept probing has recently garnered increasing interest as a way to help interpret artificial neural networks, dealing both with their typically large size and their subsymbolic nature, which ultimately renders them unfeasible for direct…

Artificial Intelligence · Computer Science 2025-07-25 Manuel de Sousa Ribeiro , Afonso Leote , João Leite

Learning Representations by Predicting Bags of Visual Words

Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data. Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially…

Computer Vision and Pattern Recognition · Computer Science 2020-02-28 Spyros Gidaris , Andrei Bursuc , Nikos Komodakis , Patrick Pérez , Matthieu Cord

Are Object-Centric Representations Better At Compositional Generalization?

Compositional generalization, the ability to reason about novel combinations of familiar concepts, is fundamental to human cognition and a critical challenge for machine learning. Object-centric (OC) representations, which encode a scene as…

Computer Vision and Pattern Recognition · Computer Science 2026-02-19 Ferdinand Kapl , Amir Mohammad Karimi Mamaghan , Maximilian Seitzer , Karl Henrik Johansson , Carsten Marr , Stefan Bauer , Andrea Dittadi

Unsupervised learning of object semantic parts from internal states of CNNs by population encoding

We address the key question of how object part representations can be found from the internal states of CNNs that are trained for high-level tasks, such as object classification. This work provides a new unsupervised method to learn…

Machine Learning · Computer Science 2016-11-15 Jianyu Wang , Zhishuai Zhang , Cihang Xie , Vittal Premachandran , Alan Yuille

Network Dissection: Quantifying Interpretability of Deep Visual Representations

We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model,…

Computer Vision and Pattern Recognition · Computer Science 2017-04-20 David Bau , Bolei Zhou , Aditya Khosla , Aude Oliva , Antonio Torralba