Related papers: Personalizing Pre-trained Models

Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations

Fine-tuning pre-trained vision-language models, like CLIP, has yielded success on diverse downstream tasks. However, several pain points persist for this paradigm: (i) directly tuning entire pre-trained models becomes both time-intensive…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Chenyu You , Yifei Min , Weicheng Dai , Jasjeet S. Sekhon , Lawrence Staib , James S. Duncan

Pre-Trained Vision-Language Models as Partial Annotators

Pre-trained vision-language models learn massive data to model unified representations of images and natural languages, which can be widely applied to downstream machine learning tasks. In addition to zero-shot inference, in order to better…

Computer Vision and Pattern Recognition · Computer Science 2024-06-28 Qian-Wei Wang , Yuqiu Xie , Letian Zhang , Zimo Liu , Shu-Tao Xia

Improving Zero-Shot Models with Label Distribution Priors

Labeling large image datasets with attributes such as facial age or object type is tedious and sometimes infeasible. Supervised machine learning methods provide a highly accurate solution, but require manual labels which are often…

Computer Vision and Pattern Recognition · Computer Science 2022-12-02 Jonathan Kahana , Niv Cohen , Yedid Hoshen

Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning

Fine-tuning vision-language models (VLMs) like CLIP to downstream tasks is often necessary to optimize their performance. However, a major obstacle is the limited availability of labeled data. We study the use of pseudolabels, i.e.,…

Computer Vision and Pattern Recognition · Computer Science 2024-03-11 Cristina Menghini , Andrew Delworth , Stephen H. Bach

Adapting OpenAI's CLIP Model for Few-Shot Image Inspection in Manufacturing Quality Control: An Expository Case Study with Multiple Application Examples

This expository paper introduces a simplified approach to image-based quality inspection in manufacturing using OpenAI's CLIP (Contrastive Language-Image Pretraining) model adapted for few-shot learning. While CLIP has demonstrated…

Computer Vision and Pattern Recognition · Computer Science 2025-07-15 Fadel M. Megahed , Ying-Ju Chen , Bianca Maria Colosimo , Marco Luigi Giuseppe Grasso , L. Allison Jones-Farmer , Sven Knoth , Hongyue Sun , Inez Zwetsloot

Memory-Efficient Continual Learning with CLIP Models

Contrastive Language-Image Pretraining (CLIP) models excel at understanding image-text relationships but struggle with adapting to new data without forgetting prior knowledge. To address this, models are typically fine-tuned using both new…

Machine Learning · Computer Science 2026-05-06 Ryan King , Gang Li , Bobak Mortazavi , Tianbao Yang

Bayesian Exploration of Pre-trained Models for Low-shot Image Classification

Low-shot image classification is a fundamental task in computer vision, and the emergence of large-scale vision-language models such as CLIP has greatly advanced the forefront of research in this field. However, most existing CLIP-based…

Computer Vision and Pattern Recognition · Computer Science 2024-04-02 Yibo Miao , Yu Lei , Feng Zhou , Zhijie Deng

Masked Unsupervised Self-training for Label-free Image Classification

State-of-the-art computer vision models are mostly trained with supervised learning using human-labeled images, which limits their scalability due to the expensive annotation cost. While self-supervised representation learning has achieved…

Computer Vision and Pattern Recognition · Computer Science 2023-03-13 Junnan Li , Silvio Savarese , Steven C. H. Hoi

Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model

Federated learning aims to tackle the ``isolated data island" problem, where it trains a collective model from physically isolated clients while safeguarding the privacy of users' data. However, supervised federated learning necessitates…

Artificial Intelligence · Computer Science 2024-04-18 Hao Yan , Yuhong Guo

CLIP model is an Efficient Continual Learner

The continual learning setting aims to learn new tasks over time without forgetting the previous ones. The literature reports several significant efforts to tackle this problem with limited or no access to previous task data. Among such…

Computer Vision and Pattern Recognition · Computer Science 2022-10-07 Vishal Thengane , Salman Khan , Munawar Hayat , Fahad Khan

Safeguarding Data in Multimodal AI: A Differentially Private Approach to CLIP Training

The surge in multimodal AI's success has sparked concerns over data privacy in vision-and-language tasks. While CLIP has revolutionized multimodal learning through joint training on images and text, its potential to unintentionally disclose…

Machine Learning · Computer Science 2024-03-04 Alyssa Huang , Peihan Liu , Ryumei Nakada , Linjun Zhang , Wanrong Zhang

CLIP-Guided Adaptable Self-Supervised Learning for Human-Centric Visual Tasks

Human-centric visual analysis plays a pivotal role in diverse applications, including surveillance, healthcare, and human-computer interaction. With the emergence of large-scale unlabeled human image datasets, there is an increasing need…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Mingshuang Luo , Ruibing Hou , Bo Chao , Hong Chang , Zimo Liu , Yaowei Wang , Shiguang Shan

DiffCLIP: Few-shot Language-driven Multimodal Classifier

Visual language models like Contrastive Language-Image Pretraining (CLIP) have shown impressive performance in analyzing natural images with language information. However, these models often encounter challenges when applied to specialized…

Computer Vision and Pattern Recognition · Computer Science 2024-12-11 Jiaqing Zhang , Mingxiang Cao , Xue Yang , Kai Jiang , Yunsong Li

Semi-Supervised Learning via Cross-Prediction-Powered Inference for Wireless Systems

In many wireless application scenarios, acquiring labeled data can be prohibitively costly, requiring complex optimization processes or measurement campaigns. Semi-supervised learning leverages unlabeled samples to augment the available…

Information Theory · Computer Science 2024-10-08 Houssem Sifaou , Osvaldo Simeone

Class Incremental Learning with Pre-trained Vision-Language Models

With the advent of large-scale pre-trained models, interest in adapting and exploiting them for continual learning scenarios has grown. In this paper, we propose an approach to exploiting pre-trained vision-language models (e.g. CLIP) that…

Computer Vision and Pattern Recognition · Computer Science 2023-11-01 Xialei Liu , Xusheng Cao , Haori Lu , Jia-wen Xiao , Andrew D. Bagdanov , Ming-Ming Cheng

Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification

Multi-label classification is crucial for comprehensive image understanding, yet acquiring accurate annotations is challenging and costly. To address this, a recent study suggests exploiting unsupervised multi-label classification…

Computer Vision and Pattern Recognition · Computer Science 2025-03-24 Dongseob Kim , Hyunjung Shim

CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification

This paper presents a CLIP-based unsupervised learning method for annotation-free multi-label image classification, including three stages: initialization, training, and inference. At the initialization stage, we take full advantage of the…

Computer Vision and Pattern Recognition · Computer Science 2024-03-08 Rabab Abdelfattah , Qing Guo , Xiaoguang Li , Xiaofeng Wang , Song Wang

Meta-Adapter: An Online Few-shot Learner for Vision-Language Model

The contrastive vision-language pre-training, known as CLIP, demonstrates remarkable potential in perceiving open-world visual concepts, enabling effective zero-shot image recognition. Nevertheless, few-shot learning methods based on CLIP…

Computer Vision and Pattern Recognition · Computer Science 2024-01-12 Cheng Cheng , Lin Song , Ruoyi Xue , Hang Wang , Hongbin Sun , Yixiao Ge , Ying Shan

Transfer of Pretrained Model Weights Substantially Improves Semi-Supervised Image Classification

Deep neural networks produce state-of-the-art results when trained on a large number of labeled examples but tend to overfit when small amounts of labeled examples are used for training. Creating a large number of labeled examples requires…

Computer Vision and Pattern Recognition · Computer Science 2021-09-13 Attaullah Sahito , Eibe Frank , Bernhard Pfahringer

Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models

The advent of large pre-trained models has brought about a paradigm shift in both visual representation learning and natural language processing. However, clustering unlabeled images, as a fundamental and classic machine learning problem,…

Computer Vision and Pattern Recognition · Computer Science 2024-04-29 Tianzhe Chu , Shengbang Tong , Tianjiao Ding , Xili Dai , Benjamin David Haeffele , René Vidal , Yi Ma