Related papers: Bayesian-guided Label Mapping for Visual Reprogram…

LAMM: Label Alignment for Multi-Modal Prompt Learning

With the success of pre-trained visual-language (VL) models such as CLIP in visual representation tasks, transferring pre-trained models to downstream tasks has become a crucial paradigm. Recently, the prompt tuning paradigm, which draws…

Computer Vision and Pattern Recognition · Computer Science 2023-12-14 Jingsheng Gao , Jiacheng Ruan , Suncheng Xiang , Zefang Yu , Ke Ji , Mingye Xie , Ting Liu , Yuzhuo Fu

Vision-Language Model Selection and Reuse for Downstream Adaptation

Pre-trained Vision-Language Models (VLMs) are becoming increasingly popular across various visual tasks, and several open-sourced VLM variants have been released. However, selecting the best-performing pre-trained VLM for a specific…

Machine Learning · Computer Science 2025-05-08 Hao-Zhe Tan , Zhi Zhou , Yu-Feng Li , Lan-Zhe Guo

Understanding and Improving Visual Prompting: A Label-Mapping Perspective

We revisit and advance visual prompting (VP), an input prompting technique for vision tasks. VP can reprogram a fixed, pre-trained source model to accomplish downstream tasks in the target domain by simply incorporating universal prompts…

Computer Vision and Pattern Recognition · Computer Science 2023-03-28 Aochuan Chen , Yuguang Yao , Pin-Yu Chen , Yihua Zhang , Sijia Liu

Bridging Vision and Language Spaces with Assignment Prediction

This paper introduces VLAP, a novel approach that bridges pretrained vision models and large language models (LLMs) to make frozen LLMs understand the visual world. VLAP transforms the embedding space of pretrained vision models into the…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Jungin Park , Jiyoung Lee , Kwanghoon Sohn

Bayesian Statistics Guided Label Refurbishment Mechanism: Mitigating Label Noise in Medical Image Classification

Purpose: Deep neural networks (DNNs) have been widely applied in medical image classification, benefiting from its powerful mapping capability among medical images. However, these existing deep learning-based methods depend on an enormous…

Computer Vision and Pattern Recognition · Computer Science 2022-10-12 Mengdi Gao , Ximeng Feng , Mufeng Geng , Zhe Jiang , Lei Zhu , Xiangxi Meng , Chuanqing Zhou , Qiushi Ren , Yanye Lu

Understanding Model Reprogramming for CLIP via Decoupling Visual Prompts

Model reprogramming adapts pretrained models to downstream tasks by modifying only the input and output spaces. Visual reprogramming (VR) is one instance for vision tasks that adds a trainable noise pattern (i.e., a visual prompt) to input…

Machine Learning · Computer Science 2025-06-03 Chengyi Cai , Zesheng Ye , Lei Feng , Jianzhong Qi , Feng Liu

Text-Region Matching for Multi-Label Image Recognition with Missing Labels

Recently, large-scale visual language pre-trained (VLP) models have demonstrated impressive performance across various downstream tasks. Motivated by these advancements, pioneering efforts have emerged in multi-label image recognition with…

Computer Vision and Pattern Recognition · Computer Science 2024-08-30 Leilei Ma , Hongxing Xie , Lei Wang , Yanping Fu , Dengdi Sun , Haifeng Zhao

Tuning Vision-Language Models with Candidate Labels by Prompt Alignment

Vision-language models (VLMs) can learn high-quality representations from a large-scale training dataset of image-text pairs. Prompt learning is a popular approach to fine-tuning VLM to adapt them to downstream tasks. Despite the satisfying…

Computer Vision and Pattern Recognition · Computer Science 2024-12-31 Zhifang Zhang , Yuwei Niu , Xin Liu , Beibei Li

Bayesian Relational Memory for Semantic Visual Navigation

We introduce a new memory architecture, Bayesian Relational Memory (BRM), to improve the generalization ability for semantic visual navigation agents in unseen environments, where an agent is given a semantic target to navigate towards. BRM…

Computer Vision and Pattern Recognition · Computer Science 2019-09-11 Yi Wu , Yuxin Wu , Aviv Tamar , Stuart Russell , Georgia Gkioxari , Yuandong Tian

Make Prompts Adaptable: Bayesian Modeling for Vision-Language Prompt Learning with Data-Dependent Prior

Recent Vision-Language Pretrained (VLP) models have become the backbone for many downstream tasks, but they are utilized as frozen model without learning. Prompt learning is a method to improve the pre-trained VLP model by adding a…

Computation and Language · Computer Science 2024-01-17 Youngjae Cho , HeeSun Bae , Seungjae Shin , Yeo Dong Youn , Weonyoung Joo , Il-Chul Moon

Medical Vision Language Pretraining: A survey

Medical Vision Language Pretraining (VLP) has recently emerged as a promising solution to the scarcity of labeled data in the medical domain. By leveraging paired/unpaired vision and text datasets through self-supervised learning, models…

Computer Vision and Pattern Recognition · Computer Science 2023-12-12 Prashant Shrestha , Sanskar Amgain , Bidur Khanal , Cristian A. Linte , Binod Bhattarai

Unified Visual Relationship Detection with Vision and Language Models

This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets. Merging labels spanning different datasets could be challenging due to inconsistent taxonomies. The issue…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Long Zhao , Liangzhe Yuan , Boqing Gong , Yin Cui , Florian Schroff , Ming-Hsuan Yang , Hartwig Adam , Ting Liu

Attribute-based Visual Reprogramming for Vision-Language Models

Visual reprogramming (VR) reuses pre-trained vision models for downstream image classification tasks by adding trainable noise patterns to inputs. When applied to vision-language models (e.g., CLIP), existing VR approaches follow the same…

Computer Vision and Pattern Recognition · Computer Science 2025-02-26 Chengyi Cai , Zesheng Ye , Lei Feng , Jianzhong Qi , Feng Liu

BendVLM: Test-Time Debiasing of Vision-Language Embeddings

Vision-language model (VLM) embeddings have been shown to encode biases present in their training data, such as societal biases that prescribe negative characteristics to members of various racial and gender identities. VLMs are being…

Computer Vision and Pattern Recognition · Computer Science 2024-11-08 Walter Gerych , Haoran Zhang , Kimia Hamidieh , Eileen Pan , Maanas Sharma , Thomas Hartvigsen , Marzyeh Ghassemi

MLIM: Vision-and-Language Model Pre-training with Masked Language and Image Modeling

Vision-and-Language Pre-training (VLP) improves model performance for downstream tasks that require image and text inputs. Current VLP approaches differ on (i) model architecture (especially image embedders), (ii) loss functions, and (iii)…

Computer Vision and Pattern Recognition · Computer Science 2021-09-28 Tarik Arici , Mehmet Saygin Seyfioglu , Tal Neiman , Yi Xu , Son Train , Trishul Chilimbi , Belinda Zeng , Ismail Tutar

Empowering Semantic-Sensitive Underwater Image Enhancement with VLM

In recent years, learning-based underwater image enhancement (UIE) techniques have rapidly evolved. However, distribution shifts between high-quality enhanced outputs and natural images can hinder semantic cue extraction for downstream…

Computer Vision and Pattern Recognition · Computer Science 2026-03-16 Guodong Fan , Shengning Zhou , Genji Yuan , Huiyu Li , Jingchun Zhou , Jinjiang Li

BEV-VLM: Trajectory Planning via Unified BEV Abstraction

This paper introduces BEV-VLM, a novel approach for trajectory planning in autonomous driving that leverages Vision-Language Models (VLMs) with Bird's-Eye View (BEV) feature maps as visual input. Unlike conventional trajectory planning…

Robotics · Computer Science 2026-03-02 Guancheng Chen , Sheng Yang , Tong Zhan , Jian Wang

Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs

Recently, Visual Programming (VP) based on large language models (LLMs) has rapidly developed and demonstrated significant potential in complex Visual Reasoning (VR) tasks. Previous works to enhance VP have primarily focused on improving…

Computer Vision and Pattern Recognition · Computer Science 2025-12-17 Wentao Wan , Kaiyu Wu , Qingyang Ma , Nan Kang , Yunjie Chen , Liang Lin , Keze Wang

Optimising Language Models for Downstream Tasks: A Post-Training Perspective

Language models (LMs) have demonstrated remarkable capabilities in NLP, yet adapting them efficiently and robustly to specific tasks remains challenging. As their scale and complexity grow, fine-tuning LMs on labelled data often…

Computation and Language · Computer Science 2025-06-27 Zhengyan Shi

Vision-Language Modeling Meets Remote Sensing: Models, Datasets and Perspectives

Vision-language modeling (VLM) aims to bridge the information gap between images and natural language. Under the new paradigm of first pre-training on massive image-text pairs and then fine-tuning on task-specific data, VLM in the remote…

Computer Vision and Pattern Recognition · Computer Science 2025-06-11 Xingxing Weng , Chao Pang , Gui-Song Xia