English
Related papers

Related papers: Towards Effective Codebookless Model for Image Cla…

200 papers

Large language models (LLMs) have been effectively used for many computer vision tasks, including image classification. In this paper, we present a simple yet effective approach for zero-shot image classification using multimodal LLMs.…

Computer Vision and Pattern Recognition · Computer Science 2025-06-27 Abdelrahman Abdelhamed , Mahmoud Afifi , Alec Go

While deep learning, including Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), has significantly advanced classification performance, its typical reliance on extensive annotated datasets presents a major obstacle in…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Matheus Vinícius Todescato , Joel Luís Carbonera

We present an image representation method which is derived from analyzing Gaussian probability density function (\emph{pdf}) space using Lie group theory. In our proposed method, images are modeled by Gaussian mixture models (GMMs) which…

Computer Vision and Pattern Recognition · Computer Science 2017-05-11 Liyu Gong , Meng Chen , Chunlong Hu

Fine-grained image classification, particularly in zero/few-shot scenarios, presents a significant challenge for vision-language models (VLMs), such as CLIP. These models often struggle with the nuanced task of distinguishing between…

Computation and Language · Computer Science 2024-05-21 Canshi Wei

Large-scale vision-language models (VLMs), trained on extensive datasets of image-text pairs, exhibit strong multimodal understanding capabilities by implicitly learning associations between textual descriptions and image regions. This…

Computer Vision and Pattern Recognition · Computer Science 2025-03-17 Mir Rayat Imtiaz Hossain , Mennatullah Siam , Leonid Sigal , James J. Little

We propose a highly data-efficient active learning framework for image classification. Our novel framework combines: (1) unsupervised representation learning of a Convolutional Neural Network and (2) the Gaussian Process (GP) method, in…

Computer Vision and Pattern Recognition · Computer Science 2022-06-22 Heng Hao , Hankyu Moon , Sima Didari , Jae Oh Woo , Patrick Bangert

Data embeddings with CLIP and ImageBind provide powerful features for the analysis of multimedia and/or multimodal data. We assess their performance here for classification using a Gaussian Mixture models (GMMs) based layer as an…

Computer Vision and Pattern Recognition · Computer Science 2024-10-18 Jeremy Chopin , Rozenn Dahyot

Few-shot image classification remains a critical challenge in the field of computer vision, particularly in data-scarce environments. Existing methods typically rely on pre-trained visual-language models, such as CLIP. However, due to the…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Xi Yang , Pai Peng , Wulin Xie , Xiaohuan Lu , Jie Wen

The task of few-shot image classification and segmentation (FS-CS) requires the classification and segmentation of target objects in a query image, given only a few examples of the target classes. We introduce a method that utilises large…

Computer Vision and Pattern Recognition · Computer Science 2023-11-22 Tian Meng , Yang Tao , Wuliang Yin

Machine learning (ML) has been widely applied to image classification. Here, we extend this application to data generated by a camera comprised of only a standard CMOS image sensor with no lens. We first created a database of lensless…

Computer Vision and Pattern Recognition · Computer Science 2017-09-05 Ganghun Kim , Stefan Kapetanovic , Rachael Palmer , Rajesh Menon

Color names based image representation is successfully used in person re-identification, due to the advantages of being compact, intuitively understandable as well as being robust to photometric variance. However, there exists the diversity…

Computer Vision and Pattern Recognition · Computer Science 2017-07-11 Yang Yang , Shengcai Liao , Zhen Lei , Stan Z. Li

Concept Bottleneck Models (CBMs) map dense feature representations into human-interpretable concepts which are then combined linearly to make a prediction. However, modern CBMs rely on the CLIP model to obtain image-concept annotations, and…

Computer Vision and Pattern Recognition · Computer Science 2026-02-27 Fawaz Sammani , Jonas Fischer , Nikos Deligiannis

Contrastively-trained Vision-Language Models (VLMs), such as CLIP, have become the standard approach for learning discriminative vision-language representations. However, these models often exhibit shallow language understanding,…

Computer Vision and Pattern Recognition · Computer Science 2025-09-24 Ioanna Ntinou , Alexandros Xenos , Yassine Ouali , Adrian Bulat , Georgios Tzimiropoulos

Vision-language models (VLMs) have enabled strong zero-shot classification through image-text alignment. Yet, their purely visual inference capabilities remain under-explored. In this work, we conduct a comprehensive evaluation of both…

Computer Vision and Pattern Recognition · Computer Science 2025-09-12 Illia Volkov , Nikita Kisel , Klara Janouskova , Jiri Matas

The past decade has seen the growing popularity of Bag of Features (BoF) approaches to many computer vision tasks, including image classification, video search, robot localization, and texture recognition. Part of the appeal is simplicity.…

Computer Vision and Pattern Recognition · Computer Science 2011-01-19 Stephen O'Hara , Bruce A. Draper

Convolutional networks require extensive image annotation, which can be costly and time-consuming. Feature Learning from Image Markers (FLIM) tackles this challenge by estimating encoder filters (i.e., kernel weights) from user-drawn…

Low-shot image classification is a fundamental task in computer vision, and the emergence of large-scale vision-language models such as CLIP has greatly advanced the forefront of research in this field. However, most existing CLIP-based…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Yibo Miao , Yu Lei , Feng Zhou , Zhijie Deng

Classifying scanned documents is a challenging problem that involves image, layout, and text analysis for document understanding. Nevertheless, for certain benchmark datasets, notably RVL-CDIP, the state of the art is closing in to…

Computer Vision and Pattern Recognition · Computer Science 2024-12-19 Anna Scius-Bertrand , Michael Jungo , Lars Vögtlin , Jean-Marc Spat , Andreas Fischer

Recently deep learning-based image compression methods have achieved significant achievements and gradually outperformed traditional approaches including the latest standard Versatile Video Coding (VVC) in both PSNR and MS-SSIM metrics. Two…

Image and Video Processing · Electrical Eng. & Systems 2024-02-13 Haisheng Fu , Feng Liang , Jianping Lin , Bing Li , Mohammad Akbari , Jie Liang , Guohe Zhang , Dong Liu , Chengjie Tu , Jingning Han

Large Vision-Language Models (LVLMs) have demonstrated impressive performance on vision-language reasoning tasks. However, their potential for zero-shot fine-grained image classification, a challenging task requiring precise differentiation…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Md. Atabuzzaman , Andrew Zhang , Chris Thomas
‹ Prev 1 2 3 10 Next ›