Related papers: CDUL: CLIP-Driven Unsupervised Learning for Multi-…

Reproducibility Study of CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification

This report is a reproducibility study of the paper "CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification" (Abdelfattah et al, ICCV 2023). Our report makes the following contributions: (1) We provide a reproducible,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-21 Manan Shah , Yash Bhalgat

Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification

Multi-label classification is crucial for comprehensive image understanding, yet acquiring accurate annotations is challenging and costly. To address this, a recent study suggests exploiting unsupervised multi-label classification…

Computer Vision and Pattern Recognition · Computer Science 2025-03-24 Dongseob Kim , Hyunjung Shim

Masked Unsupervised Self-training for Label-free Image Classification

State-of-the-art computer vision models are mostly trained with supervised learning using human-labeled images, which limits their scalability due to the expensive annotation cost. While self-supervised representation learning has achieved…

Computer Vision and Pattern Recognition · Computer Science 2023-03-13 Junnan Li , Silvio Savarese , Steven C. H. Hoi

CLIP-Guided Adaptable Self-Supervised Learning for Human-Centric Visual Tasks

Human-centric visual analysis plays a pivotal role in diverse applications, including surveillance, healthcare, and human-computer interaction. With the emergence of large-scale unlabeled human image datasets, there is an increasing need…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Mingshuang Luo , Ruibing Hou , Bo Chao , Hong Chang , Zimo Liu , Yaowei Wang , Shiguang Shan

Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning

Fine-tuning vision-language models (VLMs) like CLIP to downstream tasks is often necessary to optimize their performance. However, a major obstacle is the limited availability of labeled data. We study the use of pseudolabels, i.e.,…

Computer Vision and Pattern Recognition · Computer Science 2024-03-11 Cristina Menghini , Andrew Delworth , Stephen H. Bach

Multi-label Cluster Discrimination for Visual Representation Learning

Contrastive Language Image Pre-training (CLIP) has recently demonstrated success across various tasks due to superior feature representation empowered by image-text contrastive learning. However, the instance discrimination method used by…

Computer Vision and Pattern Recognition · Computer Science 2024-11-07 Xiang An , Kaicheng Yang , Xiangzi Dai , Ziyong Feng , Jiankang Deng

TagCLIP: A Local-to-Global Framework to Enhance Open-Vocabulary Multi-Label Classification of CLIP Without Training

Contrastive Language-Image Pre-training (CLIP) has demonstrated impressive capabilities in open-vocabulary classification. The class token in the image encoder is trained to capture the global features to distinguish different text…

Computer Vision and Pattern Recognition · Computer Science 2023-12-21 Yuqi Lin , Minghao Chen , Kaipeng Zhang , Hengjia Li , Mingming Li , Zheng Yang , Dongqin Lv , Binbin Lin , Haifeng Liu , Deng Cai

Multimodal Multilabel Classification by CLIP

Multimodal multilabel classification (MMC) is a challenging task that aims to design a learning algorithm to handle two data sources, the image and text, and learn a comprehensive semantic feature presentation across the modalities. In this…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Yanming Guo

Transductive CLIP with Class-Conditional Contrastive Learning

Inspired by the remarkable zero-shot generalization capacity of vision-language pre-trained model, we seek to leverage the supervision from CLIP model to alleviate the burden of data labeling. However, such supervision inevitably contains…

Computer Vision and Pattern Recognition · Computer Science 2022-06-14 Junchu Huang , Weijie Chen , Shicai Yang , Di Xie , Shiliang Pu , Yueting Zhuang

CLIP-Decoder : ZeroShot Multilabel Classification using Multimodal CLIP Aligned Representation

Multi-label classification is an essential task utilized in a wide variety of real-world applications. Multi-label zero-shot learning is a method for classifying images into multiple unseen categories for which no training data is…

Computer Vision and Pattern Recognition · Computer Science 2024-06-24 Muhammad Ali , Salman Khan

Unsupervised Prompt Learning for Vision-Language Models

Contrastive vision-language models like CLIP have shown great progress in transfer learning. In the inference stage, the proper text description, also known as prompt, needs to be carefully designed to correctly classify the given images.…

Computer Vision and Pattern Recognition · Computer Science 2022-08-23 Tony Huang , Jack Chu , Fangyun Wei

CLISC: Bridging clip and sam by enhanced cam for unsupervised brain tumor segmentation

Brain tumor segmentation is important for diagnosis of the tumor, and current deep-learning methods rely on a large set of annotated images for training, with high annotation costs. Unsupervised segmentation is promising to avoid human…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 Xiaochuan Ma , Jia Fu , Wenjun Liao , Shichuan Zhang , Guotai Wang

The Solution for Language-Enhanced Image New Category Discovery

Treating texts as images, combining prompts with textual labels for prompt tuning, and leveraging the alignment properties of CLIP have been successfully applied in zero-shot multi-label image recognition. Nonetheless, relying solely on…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Haonan Xu , Dian Chao , Xiangyu Wu , Zhonghua Wan , Yang Yang

SuperCLIP: CLIP with Simple Classification Supervision

Contrastive Language-Image Pretraining (CLIP) achieves strong generalization in vision-language tasks by aligning images and texts in a shared embedding space. However, recent findings show that CLIP-like models still underutilize…

Computer Vision and Pattern Recognition · Computer Science 2025-12-17 Weiheng Zhao , Zilong Huang , Jiashi Feng , Xinggang Wang

Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning

This paper proposes a novel framework for multi-label image recognition without any training data, called data-free framework, which uses knowledge of pre-trained Large Language Model (LLM) to learn prompts to adapt pretrained…

Computer Vision and Pattern Recognition · Computer Science 2024-03-05 Shuo Yang , Zirui Shang , Yongqi Wang , Derong Deng , Hongwei Chen , Qiyuan Cheng , Xinxiao Wu

Extract Free Dense Labels from CLIP

Contrastive Language-Image Pre-training (CLIP) has made a remarkable breakthrough in open-vocabulary zero-shot image recognition. Many recent studies leverage the pre-trained CLIP models for image-level classification and manipulation. In…

Computer Vision and Pattern Recognition · Computer Science 2022-07-28 Chong Zhou , Chen Change Loy , Bo Dai

CLIP-GCD: Simple Language Guided Generalized Category Discovery

Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data. Prior methods leveraged self-supervised pre-training combined with supervised fine-tuning on the…

Computer Vision and Pattern Recognition · Computer Science 2023-05-18 Rabah Ouldnoughi , Chia-Wen Kuo , Zsolt Kira

Cooperative Pseudo Labeling for Unsupervised Federated Classification

Unsupervised Federated Learning (UFL) aims to collaboratively train a global model across distributed clients without sharing data or accessing label information. Previous UFL works have predominantly focused on representation learning and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Kuangpu Guo , Lijun Sheng , Yongcan Yu , Jian Liang , Zilei Wang , Ran He

Cross-Modal Self-Training: Aligning Images and Pointclouds to Learn Classification without Labels

Large-scale vision 2D vision language models, such as CLIP can be aligned with a 3D encoder to learn generalizable (open-vocabulary) 3D vision models. However, current methods require supervised pre-training for such alignment, and the…

Computer Vision and Pattern Recognition · Computer Science 2024-04-17 Amaya Dharmasiri , Muzammal Naseer , Salman Khan , Fahad Shahbaz Khan

Self-Supervised Image Captioning with CLIP

Image captioning, a fundamental task in vision-language understanding, seeks to generate accurate natural language descriptions for provided images. Current image captioning approaches heavily rely on high-quality image-caption pairs, which…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Chuanyang Jin