Related papers: MSGCoOp: Multiple Semantic-Guided Context Optimiza…

Conditional Prompt Learning for Vision-Language Models

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential to investigate ways to adapt these models to downstream datasets. A recently proposed method named Context Optimization (CoOp) introduces the…

Computer Vision and Pattern Recognition · Computer Science 2022-10-07 Kaiyang Zhou , Jingkang Yang , Chen Change Loy , Ziwei Liu

Visual-Language Prompt Tuning with Knowledge-guided Context Optimization

Prompt tuning is an effective way to adapt the pre-trained visual-language model (VLM) to the downstream task using task-related textual tokens. Representative CoOp-based work combines the learnable textual tokens with the class tokens to…

Computer Vision and Pattern Recognition · Computer Science 2023-03-24 Hantao Yao , Rui Zhang , Changsheng Xu

Learning to Prompt for Vision-Language Models

Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based…

Computer Vision and Pattern Recognition · Computer Science 2022-10-07 Kaiyang Zhou , Jingkang Yang , Chen Change Loy , Ziwei Liu

MMLoP: Multi-Modal Low-Rank Prompting for Efficient Vision-Language Adaptation

Prompt learning has become a dominant paradigm for adapting vision-language models (VLMs) such as CLIP to downstream tasks without modifying pretrained weights. While extending prompts to both vision and text encoders across multiple…

Computer Vision and Pattern Recognition · Computer Science 2026-02-26 Sajjad Ghiasvand , Haniyeh Ehsani Oskouie , Mahnoosh Alizadeh , Ramtin Pedarsani

Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models

Prompt learning has become one of the most efficient paradigms for adapting large pre-trained vision-language models to downstream tasks. Current state-of-the-art methods, like CoOp and ProDA, tend to adopt soft prompts to learn an…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Sifan Long , Zhen Zhao , Junkun Yuan , Zichang Tan , Jiangjiang Liu , Luping Zhou , Shengsheng Wang , Jingdong Wang

ECO: Ensembling Context Optimization for Vision-Language Models

Image recognition has recently witnessed a paradigm shift, where vision-language models are now used to perform few-shot classification based on textual prompts. Among these, the CLIP model has shown remarkable capabilities for zero-shot…

Computer Vision and Pattern Recognition · Computer Science 2023-07-27 Lorenzo Agnolucci , Alberto Baldrati , Francesco Todino , Federico Becattini , Marco Bertini , Alberto Del Bimbo

GroupCoOp: Group-robust Fine-tuning via Group Prompt Learning

Parameter-efficient fine-tuning (PEFT) of vision-language models (VLMs) excels in various vision tasks thanks to the rich knowledge and generalization ability of VLMs. However, recent studies revealed that such fine-tuned VLMs are…

Computer Vision and Pattern Recognition · Computer Science 2025-09-30 Nayeong Kim , Seong Joon Oh , Suha Kwak

Domain-Invariant Prompt Learning for Vision-Language Models

Large pre-trained vision-language models like CLIP have transformed computer vision by aligning images and text in a shared feature space, enabling robust zero-shot transfer via prompting. Soft-prompting, such as Context Optimization…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Arsham Gholamzadeh Khoee , Yinan Yu , Robert Feldt

Context-Aware Prompt Tuning for Vision-Language Model with Dual-Alignment

Large-scale vision-language models (VLMs), e.g., CLIP, learn broad visual concepts from tedious training data, showing superb generalization ability. Amount of prompt learning methods have been proposed to efficiently adapt the VLMs to…

Computer Vision and Pattern Recognition · Computer Science 2023-09-11 Hongyu Hu , Tiancheng Lin , Jie Wang , Zhenbang Sun , Yi Xu

Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models

Vision-Language Models (VLMs), such as CLIP, play a foundational role in various cross-modal applications. To fully leverage VLMs' potential in adapting to downstream tasks, context optimization methods like Prompt Tuning are essential.…

Computer Vision and Pattern Recognition · Computer Science 2025-08-19 Enming Zhang , Bingke Zhu , Yingying Chen , Qinghai Miao , Ming Tang , Jinqiao Wang

Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models

Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts. Instead of designing prompts manually, Context Optimization (CoOp) has been…

Computer Vision and Pattern Recognition · Computer Science 2023-02-15 Chengcheng Ma , Yang Liu , Jiankang Deng , Lingxi Xie , Weiming Dong , Changsheng Xu

Multiple Stochastic Prompt Tuning for Few-shot Adaptation under Extreme Domain Shift

Foundation Vision-Language Models (VLMs) like CLIP exhibit strong generalization capabilities due to large-scale pretraining on diverse image-text pairs. However, their performance often degrades when applied to target datasets with…

Computer Vision and Pattern Recognition · Computer Science 2025-08-13 Debarshi Brahma , Soma Biswas

BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

Recent advancements in vision-language models (VLMs), such as CLIP, have demonstrated substantial success in self-supervised representation learning for vision tasks. However, effectively adapting VLMs to downstream applications remains…

Computer Vision and Pattern Recognition · Computer Science 2025-03-13 Taha Koleilat , Hojat Asgariandehkordi , Hassan Rivaz , Yiming Xiao

Concept-Guided Prompt Learning for Generalization in Vision-Language Models

Contrastive Language-Image Pretraining (CLIP) model has exhibited remarkable efficacy in establishing cross-modal connections between texts and images, yielding impressive performance across a broad spectrum of downstream applications…

Computer Vision and Pattern Recognition · Computer Science 2024-01-17 Yi Zhang , Ce Zhang , Ke Yu , Yushun Tang , Zhihai He

Accelerating Conditional Prompt Learning via Masked Image Modeling for Vision-Language Models

Vision-language models (VLMs) like CLIP excel in zero-shot learning but often require resource-intensive training to adapt to new tasks. Prompt learning techniques, such as CoOp and CoCoOp, offer efficient adaptation but tend to overfit to…

Computer Vision and Pattern Recognition · Computer Science 2025-08-08 Phuoc-Nguyen Bui , Khanh-Binh Nguyen , Hyunseung Choo

VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models

Vision-language models (VLMs), such as CLIP, have shown strong generalization under zero-shot settings, yet adapting them to downstream tasks with limited supervision remains a significant challenge. Existing multi-modal prompt learning…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Silin Cheng , Kai Han

Multi-modal Mutual-Guidance Conditional Prompt Learning for Vision-Language Models

Prompt learning facilitates the efficient adaptation of Vision-Language Models (VLMs) to various downstream tasks. However, it faces two significant challenges: (1) inadequate modeling of class embedding distributions for unseen instances,…

Computer Vision and Pattern Recognition · Computer Science 2025-07-14 Shijun Yang , Xiang Zhang , Wanqing Zhao , Hangzai Luo , Sheng Zhong , Jinye Peng , Jianping Fan

Prompt-OT: An Optimal Transport Regularization Paradigm for Knowledge Preservation in Vision-Language Model Adaptation

Vision-language models (VLMs) such as CLIP demonstrate strong performance but struggle when adapted to downstream tasks. Prompt learning has emerged as an efficient and effective strategy to adapt VLMs while preserving their pre-trained…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Xiwen Chen , Wenhui Zhu , Peijie Qiu , Hao Wang , Huayu Li , Haiyu Wu , Aristeidis Sotiras , Yalin Wang , Abolfazl Razi

Local-Global Prompt Learning via Sparse Optimal Transport

Few-shot adaptation of vision-language models (VLMs) like CLIP typically relies on learning textual prompts matched to global image embeddings. Recent works extend this paradigm by incorporating local image-text alignment to capture…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Deniz Kizaroğlu , Ülku Tuncer Küçüktas , Emre Çakmakyurdu , Alptekin Temizel

Prompt Tuning with Soft Context Sharing for Vision-Language Models

Vision-language models have recently shown great potential on many tasks in computer vision. Meanwhile, prior work demonstrates prompt tuning designed for vision-language models could acquire superior performance on few-shot image…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Kun Ding , Ying Wang , Pengzhang Liu , Qiang Yu , Haojian Zhang , Shiming Xiang , Chunhong Pan