Related papers: Conditional Prompt Learning for Vision-Language Mo…

Learning to Prompt for Vision-Language Models

Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based…

Computer Vision and Pattern Recognition · Computer Science 2022-10-07 Kaiyang Zhou , Jingkang Yang , Chen Change Loy , Ziwei Liu

Domain-Invariant Prompt Learning for Vision-Language Models

Large pre-trained vision-language models like CLIP have transformed computer vision by aligning images and text in a shared feature space, enabling robust zero-shot transfer via prompting. Soft-prompting, such as Context Optimization…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Arsham Gholamzadeh Khoee , Yinan Yu , Robert Feldt

Visual-Language Prompt Tuning with Knowledge-guided Context Optimization

Prompt tuning is an effective way to adapt the pre-trained visual-language model (VLM) to the downstream task using task-related textual tokens. Representative CoOp-based work combines the learnable textual tokens with the class tokens to…

Computer Vision and Pattern Recognition · Computer Science 2023-03-24 Hantao Yao , Rui Zhang , Changsheng Xu

Compositional Kronecker Context Optimization for Vision-Language Models

Context Optimization (CoOp) has emerged as a simple yet effective technique for adapting CLIP-like vision-language models to downstream image recognition tasks. Nevertheless, learning compact context with satisfactory base-to-new, domain…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Kun Ding , Xiaohui Li , Qiang Yu , Ying Wang , Haojian Zhang , Shiming Xiang

Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models

Prompt learning has become one of the most efficient paradigms for adapting large pre-trained vision-language models to downstream tasks. Current state-of-the-art methods, like CoOp and ProDA, tend to adopt soft prompts to learn an…

Computer Vision and Pattern Recognition · Computer Science 2023-03-31 Sifan Long , Zhen Zhao , Junkun Yuan , Zichang Tan , Jiangjiang Liu , Luping Zhou , Shengsheng Wang , Jingdong Wang

MSGCoOp: Multiple Semantic-Guided Context Optimization for Few-Shot Learning

Vision-language pre-trained models (VLMs) such as CLIP have demonstrated remarkable zero-shot generalization, and prompt learning has emerged as an efficient alternative to full fine-tuning. However, existing methods often struggle with…

Computer Vision and Pattern Recognition · Computer Science 2025-07-30 Zhaolong Wang , Tongfeng Sun , Mingzheng Du , Yachao Huang

Understanding and Mitigating Overfitting in Prompt Tuning for Vision-Language Models

Pretrained vision-language models (VLMs) such as CLIP have shown impressive generalization capability in downstream vision tasks with appropriate text prompts. Instead of designing prompts manually, Context Optimization (CoOp) has been…

Computer Vision and Pattern Recognition · Computer Science 2023-02-15 Chengcheng Ma , Yang Liu , Jiankang Deng , Lingxi Xie , Weiming Dong , Changsheng Xu

BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models

Recent advancements in vision-language models (VLMs), such as CLIP, have demonstrated substantial success in self-supervised representation learning for vision tasks. However, effectively adapting VLMs to downstream applications remains…

Computer Vision and Pattern Recognition · Computer Science 2025-03-13 Taha Koleilat , Hojat Asgariandehkordi , Hassan Rivaz , Yiming Xiao

AAPL: Adding Attributes to Prompt Learning for Vision-Language Models

Recent advances in large pre-trained vision-language models have demonstrated remarkable performance on zero-shot downstream tasks. Building upon this, recent studies, such as CoOp and CoCoOp, have proposed the use of prompt learning, where…

Computer Vision and Pattern Recognition · Computer Science 2024-04-26 Gahyeon Kim , Sohee Kim , Seokju Lee

CoPL: Contextual Prompt Learning for Vision-Language Understanding

Recent advances in multimodal learning has resulted in powerful vision-language models, whose representations are generalizable across a variety of downstream tasks. Recently, their generalization ability has been further extended by…

Computer Vision and Pattern Recognition · Computer Science 2023-12-13 Koustava Goswami , Srikrishna Karanam , Prateksha Udhayanan , K J Joseph , Balaji Vasan Srinivasan

Concept-Guided Prompt Learning for Generalization in Vision-Language Models

Contrastive Language-Image Pretraining (CLIP) model has exhibited remarkable efficacy in establishing cross-modal connections between texts and images, yielding impressive performance across a broad spectrum of downstream applications…

Computer Vision and Pattern Recognition · Computer Science 2024-01-17 Yi Zhang , Ce Zhang , Ke Yu , Yushun Tang , Zhihai He

Accelerating Conditional Prompt Learning via Masked Image Modeling for Vision-Language Models

Vision-language models (VLMs) like CLIP excel in zero-shot learning but often require resource-intensive training to adapt to new tasks. Prompt learning techniques, such as CoOp and CoCoOp, offer efficient adaptation but tend to overfit to…

Computer Vision and Pattern Recognition · Computer Science 2025-08-08 Phuoc-Nguyen Bui , Khanh-Binh Nguyen , Hyunseung Choo

Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models

Vision-Language Models (VLMs), such as CLIP, play a foundational role in various cross-modal applications. To fully leverage VLMs' potential in adapting to downstream tasks, context optimization methods like Prompt Tuning are essential.…

Computer Vision and Pattern Recognition · Computer Science 2025-08-19 Enming Zhang , Bingke Zhu , Yingying Chen , Qinghai Miao , Ming Tang , Jinqiao Wang

CoDoL: Conditional Domain Prompt Learning for Out-of-Distribution Generalization

Recent advances in pre-training vision-language models (VLMs), e.g., contrastive language-image pre-training (CLIP) methods, have shown great potential in learning out-of-distribution (OOD) representations. Despite showing competitive…

Computer Vision and Pattern Recognition · Computer Science 2025-09-22 Min Zhang , Bo Jiang , Jie Zhou , Yimeng Liu , Xin Lin

Towards Compatible Fine-tuning for Vision-Language Model Updates

So far, efficient fine-tuning has become a popular strategy for enhancing the capabilities of foundation models on downstream tasks by learning plug-and-play modules. However, existing methods overlook a crucial issue: if the underlying…

Computer Vision and Pattern Recognition · Computer Science 2024-12-31 Zhengbo Wang , Jian Liang , Lijun Sheng , Ran He , Zilei Wang , Tieniu Tan

Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models

Pre-trained vision-language models, e.g., CLIP, working with manually designed prompts have demonstrated great capacity of transfer learning. Recently, learnable prompts achieve state-of-the-art performance, which however are prone to…

Computer Vision and Pattern Recognition · Computer Science 2023-08-23 Baoshuo Kan , Teng Wang , Wenpeng Lu , Xiantong Zhen , Weili Guan , Feng Zheng

PRE: Vision-Language Prompt Learning with Reparameterization Encoder

Large pre-trained vision-language models such as CLIP have demonstrated great potential in zero-shot transferability to downstream tasks. However, to attain optimal performance, the manual selection of prompts is necessary to improve…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Thi Minh Anh Pham , An Duc Nguyen , Cephas Svosve , Vasileios Argyriou , Georgios Tzimiropoulos

Learning Domain Invariant Prompt for Vision-Language Models

Prompt learning is one of the most effective and trending ways to adapt powerful vision-language foundation models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples. However, although prompt learning…

Computer Vision and Pattern Recognition · Computer Science 2023-04-03 Cairong Zhao , Yubin Wang , Xinyang Jiang , Yifei Shen , Kaitao Song , Dongsheng Li , Duoqian Miao

Consistency-guided Prompt Learning for Vision-Language Models

We propose Consistency-guided Prompt learning (CoPrompt), a new fine-tuning method for vision-language models. Our approach improves the generalization of large foundation models when fine-tuned on downstream tasks in a few-shot setting.…

Computer Vision and Pattern Recognition · Computer Science 2024-08-06 Shuvendu Roy , Ali Etemad

CLIP-Adapter: Better Vision-Language Models with Feature Adapters

Large-scale contrastive vision-language pre-training has shown significant progress in visual representation learning. Unlike traditional visual systems trained by a fixed set of discrete labels, a new paradigm was introduced in…

Computer Vision and Pattern Recognition · Computer Science 2025-03-26 Peng Gao , Shijie Geng , Renrui Zhang , Teli Ma , Rongyao Fang , Yongfeng Zhang , Hongsheng Li , Yu Qiao