Related papers: Towards Effective Codebookless Model for Image Cla…

What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models

Large language models (LLMs) have been effectively used for many computer vision tasks, including image classification. In this paper, we present a simple yet effective approach for zero-shot image classification using multimodal LLMs.…

Computer Vision and Pattern Recognition · Computer Science 2025-06-27 Abdelrahman Abdelhamed , Mahmoud Afifi , Alec Go

No Labels Needed: Zero-Shot Image Classification with Collaborative Self-Learning

While deep learning, including Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), has significantly advanced classification performance, its typical reliance on extensive annotated datasets presents a major obstacle in…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Matheus Vinícius Todescato , Joel Luís Carbonera

Lie Algebrized Gaussians for Image Representation

We present an image representation method which is derived from analyzing Gaussian probability density function (\emph{pdf}) space using Lie group theory. In our proposed method, images are modeled by Gaussian mixture models (GMMs) which…

Computer Vision and Pattern Recognition · Computer Science 2017-05-11 Liyu Gong , Meng Chen , Chunlong Hu

Enhancing Fine-Grained Image Classifications via Cascaded Vision Language Models

Fine-grained image classification, particularly in zero/few-shot scenarios, presents a significant challenge for vision-language models (VLMs), such as CLIP. These models often struggle with the nuanced task of distinguishing between…

Computation and Language · Computer Science 2024-05-21 Canshi Wei

The Power of One: A Single Example is All it Takes for Segmentation in VLMs

Large-scale vision-language models (VLMs), trained on extensive datasets of image-text pairs, exhibit strong multimodal understanding capabilities by implicitly learning associations between textual descriptions and image regions. This…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 Mir Rayat Imtiaz Hossain , Mennatullah Siam , Leonid Sigal , James J. Little

Highly Efficient Representation and Active Learning Framework and Its Application to Imbalanced Medical Image Classification

We propose a highly data-efficient active learning framework for image classification. Our novel framework combines: (1) unsupervised representation learning of a Convolutional Neural Network and (2) the Gaussian Process (GP) method, in…

Computer Vision and Pattern Recognition · Computer Science 2022-06-22 Heng Hao , Hankyu Moon , Sima Didari , Jae Oh Woo , Patrick Bangert

Performance of Gaussian Mixture Model Classifiers on Embedded Feature Spaces

Data embeddings with CLIP and ImageBind provide powerful features for the analysis of multimedia and/or multimodal data. We assess their performance here for classification using a Gaussian Mixture models (GMMs) based layer as an…

Computer Vision and Pattern Recognition · Computer Science 2024-10-18 Jeremy Chopin , Rozenn Dahyot

Cross-Modal Mapping: Mitigating the Modality Gap for Few-Shot Image Classification

Few-shot image classification remains a critical challenge in the field of computer vision, particularly in data-scarce environments. Existing methods typically rely on pre-trained visual-language models, such as CLIP. However, due to the…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Xi Yang , Pai Peng , Wulin Xie , Xiaohuan Lu , Jie Wen

Few-Shot Classification & Segmentation Using Large Language Models Agent

The task of few-shot image classification and segmentation (FS-CS) requires the classification and segmentation of target objects in a query image, given only a few examples of the target classes. We introduce a method that utilises large…

Computer Vision and Pattern Recognition · Computer Science 2023-11-22 Tian Meng , Yang Tao , Wuliang Yin

Lensless-camera based machine learning for image classification

Machine learning (ML) has been widely applied to image classification. Here, we extend this application to data generated by a camera comprised of only a standard CMOS image sensor with no lens. We first created a database of lensless…

Computer Vision and Pattern Recognition · Computer Science 2017-09-05 Ganghun Kim , Stefan Kapetanovic , Rachael Palmer , Rajesh Menon

Learning Efficient Image Representation for Person Re-Identification

Color names based image representation is successfully used in person re-identification, due to the advantages of being compact, intuitively understandable as well as being robust to photometric variance. However, there exists the diversity…

Computer Vision and Pattern Recognition · Computer Science 2017-07-11 Yang Yang , Shengcai Liao , Zhen Lei , Stan Z. Li

CLIP-Free, Label Free, Unsupervised Concept Bottleneck Models

Concept Bottleneck Models (CBMs) map dense feature representations into human-interpretable concepts which are then combined linearly to make a prediction. However, modern CBMs rely on the CLIP model to obtain image-concept annotations, and…

Computer Vision and Pattern Recognition · Computer Science 2026-02-27 Fawaz Sammani , Jonas Fischer , Nikos Deligiannis

Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene Descriptions

Contrastively-trained Vision-Language Models (VLMs), such as CLIP, have become the standard approach for learning discriminative vision-language representations. However, these models often exhibit shallow language understanding,…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Ioanna Ntinou , Alexandros Xenos , Yassine Ouali , Adrian Bulat , Georgios Tzimiropoulos

Image Recognition with Vision and Language Embeddings of VLMs

Vision-language models (VLMs) have enabled strong zero-shot classification through image-text alignment. Yet, their purely visual inference capabilities remain under-explored. In this work, we conduct a comprehensive evaluation of both…

Computer Vision and Pattern Recognition · Computer Science 2025-09-12 Illia Volkov , Nikita Kisel , Klara Janouskova , Jiri Matas

Introduction to the Bag of Features Paradigm for Image Classification and Retrieval

The past decade has seen the growing popularity of Bag of Features (BoF) approaches to many computer vision tasks, including image classification, video search, robot localization, and texture recognition. Part of the appeal is simplicity.…

Computer Vision and Pattern Recognition · Computer Science 2011-01-19 Stephen O'Hara , Bruce A. Draper

FLIM Networks with Bag of Feature Points

Convolutional networks require extensive image annotation, which can be costly and time-consuming. Feature Learning from Image Markers (FLIM) tackles this challenge by estimating encoder filters (i.e., kernel weights) from user-drawn…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 João Deltregia Martinelli , Marcelo Luis Rodrigues Filho , Felipe Crispim da Rocha Salvagnini , Gilson Junior Soares , Jefersson A. dos Santos , Alexandre X. Falcão

Bayesian Exploration of Pre-trained Models for Low-shot Image Classification

Low-shot image classification is a fundamental task in computer vision, and the emergence of large-scale vision-language models such as CLIP has greatly advanced the forefront of research in this field. However, most existing CLIP-based…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Yibo Miao , Yu Lei , Feng Zhou , Zhijie Deng

Zero-Shot Prompting and Few-Shot Fine-Tuning: Revisiting Document Image Classification Using Large Language Models

Classifying scanned documents is a challenging problem that involves image, layout, and text analysis for document understanding. Nevertheless, for certain benchmark datasets, notably RVL-CDIP, the state of the art is closing in to…

Computer Vision and Pattern Recognition · Computer Science 2024-12-19 Anna Scius-Bertrand , Michael Jungo , Lars Vögtlin , Jean-Marc Spat , Andreas Fischer

Learned Image Compression with Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules

Recently deep learning-based image compression methods have achieved significant achievements and gradually outperformed traditional approaches including the latest standard Versatile Video Coding (VVC) in both PSNR and MS-SSIM metrics. Two…

Image and Video Processing · Electrical Eng. & Systems 2024-02-13 Haisheng Fu , Feng Liang , Jianping Lin , Bing Li , Mohammad Akbari , Jie Liang , Guohe Zhang , Dong Liu , Chengjie Tu , Jingning Han

Zero-Shot Fine-Grained Image Classification Using Large Vision-Language Models

Large Vision-Language Models (LVLMs) have demonstrated impressive performance on vision-language reasoning tasks. However, their potential for zero-shot fine-grained image classification, a challenging task requiring precise differentiation…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Md. Atabuzzaman , Andrew Zhang , Chris Thomas