English
Related papers

Related papers: CDUL: CLIP-Driven Unsupervised Learning for Multi-…

200 papers

This report is a reproducibility study of the paper "CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification" (Abdelfattah et al, ICCV 2023). Our report makes the following contributions: (1) We provide a reproducible,…

Computer Vision and Pattern Recognition · Computer Science 2024-05-21 Manan Shah , Yash Bhalgat

Multi-label classification is crucial for comprehensive image understanding, yet acquiring accurate annotations is challenging and costly. To address this, a recent study suggests exploiting unsupervised multi-label classification…

Computer Vision and Pattern Recognition · Computer Science 2025-03-24 Dongseob Kim , Hyunjung Shim

State-of-the-art computer vision models are mostly trained with supervised learning using human-labeled images, which limits their scalability due to the expensive annotation cost. While self-supervised representation learning has achieved…

Computer Vision and Pattern Recognition · Computer Science 2023-03-13 Junnan Li , Silvio Savarese , Steven C. H. Hoi

Human-centric visual analysis plays a pivotal role in diverse applications, including surveillance, healthcare, and human-computer interaction. With the emergence of large-scale unlabeled human image datasets, there is an increasing need…

Computer Vision and Pattern Recognition · Computer Science 2026-01-21 Mingshuang Luo , Ruibing Hou , Bo Chao , Hong Chang , Zimo Liu , Yaowei Wang , Shiguang Shan

Fine-tuning vision-language models (VLMs) like CLIP to downstream tasks is often necessary to optimize their performance. However, a major obstacle is the limited availability of labeled data. We study the use of pseudolabels, i.e.,…

Computer Vision and Pattern Recognition · Computer Science 2024-03-11 Cristina Menghini , Andrew Delworth , Stephen H. Bach

Contrastive Language Image Pre-training (CLIP) has recently demonstrated success across various tasks due to superior feature representation empowered by image-text contrastive learning. However, the instance discrimination method used by…

Computer Vision and Pattern Recognition · Computer Science 2024-11-07 Xiang An , Kaicheng Yang , Xiangzi Dai , Ziyong Feng , Jiankang Deng

Contrastive Language-Image Pre-training (CLIP) has demonstrated impressive capabilities in open-vocabulary classification. The class token in the image encoder is trained to capture the global features to distinguish different text…

Computer Vision and Pattern Recognition · Computer Science 2023-12-21 Yuqi Lin , Minghao Chen , Kaipeng Zhang , Hengjia Li , Mingming Li , Zheng Yang , Dongqin Lv , Binbin Lin , Haifeng Liu , Deng Cai

Multimodal multilabel classification (MMC) is a challenging task that aims to design a learning algorithm to handle two data sources, the image and text, and learn a comprehensive semantic feature presentation across the modalities. In this…

Computer Vision and Pattern Recognition · Computer Science 2024-06-25 Yanming Guo

Inspired by the remarkable zero-shot generalization capacity of vision-language pre-trained model, we seek to leverage the supervision from CLIP model to alleviate the burden of data labeling. However, such supervision inevitably contains…

Computer Vision and Pattern Recognition · Computer Science 2022-06-14 Junchu Huang , Weijie Chen , Shicai Yang , Di Xie , Shiliang Pu , Yueting Zhuang

Multi-label classification is an essential task utilized in a wide variety of real-world applications. Multi-label zero-shot learning is a method for classifying images into multiple unseen categories for which no training data is…

Computer Vision and Pattern Recognition · Computer Science 2024-06-24 Muhammad Ali , Salman Khan

Contrastive vision-language models like CLIP have shown great progress in transfer learning. In the inference stage, the proper text description, also known as prompt, needs to be carefully designed to correctly classify the given images.…

Computer Vision and Pattern Recognition · Computer Science 2022-08-23 Tony Huang , Jack Chu , Fangyun Wei

Brain tumor segmentation is important for diagnosis of the tumor, and current deep-learning methods rely on a large set of annotated images for training, with high annotation costs. Unsupervised segmentation is promising to avoid human…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 Xiaochuan Ma , Jia Fu , Wenjun Liao , Shichuan Zhang , Guotai Wang

Treating texts as images, combining prompts with textual labels for prompt tuning, and leveraging the alignment properties of CLIP have been successfully applied in zero-shot multi-label image recognition. Nonetheless, relying solely on…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Haonan Xu , Dian Chao , Xiangyu Wu , Zhonghua Wan , Yang Yang

Contrastive Language-Image Pretraining (CLIP) achieves strong generalization in vision-language tasks by aligning images and texts in a shared embedding space. However, recent findings show that CLIP-like models still underutilize…

Computer Vision and Pattern Recognition · Computer Science 2025-12-17 Weiheng Zhao , Zilong Huang , Jiashi Feng , Xinggang Wang

This paper proposes a novel framework for multi-label image recognition without any training data, called data-free framework, which uses knowledge of pre-trained Large Language Model (LLM) to learn prompts to adapt pretrained…

Computer Vision and Pattern Recognition · Computer Science 2024-03-05 Shuo Yang , Zirui Shang , Yongqi Wang , Derong Deng , Hongwei Chen , Qiyuan Cheng , Xinxiao Wu

Contrastive Language-Image Pre-training (CLIP) has made a remarkable breakthrough in open-vocabulary zero-shot image recognition. Many recent studies leverage the pre-trained CLIP models for image-level classification and manipulation. In…

Computer Vision and Pattern Recognition · Computer Science 2022-07-28 Chong Zhou , Chen Change Loy , Bo Dai

Generalized Category Discovery (GCD) requires a model to both classify known categories and cluster unknown categories in unlabeled data. Prior methods leveraged self-supervised pre-training combined with supervised fine-tuning on the…

Computer Vision and Pattern Recognition · Computer Science 2023-05-18 Rabah Ouldnoughi , Chia-Wen Kuo , Zsolt Kira

Unsupervised Federated Learning (UFL) aims to collaboratively train a global model across distributed clients without sharing data or accessing label information. Previous UFL works have predominantly focused on representation learning and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Kuangpu Guo , Lijun Sheng , Yongcan Yu , Jian Liang , Zilei Wang , Ran He

Large-scale vision 2D vision language models, such as CLIP can be aligned with a 3D encoder to learn generalizable (open-vocabulary) 3D vision models. However, current methods require supervised pre-training for such alignment, and the…

Computer Vision and Pattern Recognition · Computer Science 2024-04-17 Amaya Dharmasiri , Muzammal Naseer , Salman Khan , Fahad Shahbaz Khan

Image captioning, a fundamental task in vision-language understanding, seeks to generate accurate natural language descriptions for provided images. Current image captioning approaches heavily rely on high-quality image-caption pairs, which…

Computer Vision and Pattern Recognition · Computer Science 2023-11-03 Chuanyang Jin
‹ Prev 1 2 3 10 Next ›