English
Related papers

Related papers: Self-Supervised Visual Preference Alignment

200 papers

Recent advances in video-large language models (Video-LLMs) have led to significant progress in video understanding. Current preference optimization methods often rely on proprietary APIs or human-annotated captions to generate preference…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Yogesh Kulkarni , Pooyan Fazli

Large Vision-Language Models (LVLMs) have shown promising capabilities in understanding and generating information by integrating both visual and textual data. However, current models are still prone to hallucinations, which degrade the…

Computer Vision and Pattern Recognition · Computer Science 2025-11-13 Robert Wijaya , Ngoc-Bao Nguyen , Ngai-Man Cheung

Large vision-language models (LVLMs) often fail to align with human preferences, leading to issues like generating misleading content without proper visual context (also known as hallucination). A promising solution to this problem is using…

Computer Vision and Pattern Recognition · Computer Science 2025-02-03 Chenglong Wang , Yang Gan , Yifu Huo , Yongyu Mu , Murun Yang , Qiaozhi He , Tong Xiao , Chunliang Zhang , Tongran Liu , Quan Du , Di Yang , Jingbo Zhu

Large Visual Language Models (LVLMs) increasingly rely on preference alignment to ensure reliability, which steers the model behavior via preference fine-tuning on preference data structured as ``image - winner text - loser text'' triplets.…

Computer Vision and Pattern Recognition · Computer Science 2025-03-10 Kejia Chen , Jiawen Zhang , Jiacong Hu , Jiazhen Yang , Jian Lou , Zunlei Feng , Mingli Song

Traditional alignment methods for Large Vision and Language Models (LVLMs) primarily rely on human-curated preference data. Human-generated preference data is costly; machine-generated preference data is limited in quality; and…

Computer Vision and Pattern Recognition · Computer Science 2025-09-03 Jefferson Hernandez , Jing Shi , Simon Jenni , Vicente Ordonez , Kushal Kafle

Despite recent advances in Vision-Language Models (VLMs), they may over-rely on visual language priors existing in their training data rather than true visual reasoning. To investigate this, we introduce ViLP, a benchmark featuring…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Tiange Luo , Ang Cao , Gunhee Lee , Justin Johnson , Honglak Lee

Preference-based reinforcement learning (RL) offers a promising approach for aligning policies with human intent but is often constrained by the high cost of human feedback. In this work, we introduce PrefVLM, a framework that integrates…

Machine Learning · Computer Science 2025-02-04 Udita Ghosh , Dripta S. Raychaudhuri , Jiachen Li , Konstantinos Karydis , Amit Roy-Chowdhury

Vision-language models (VLMs) have demonstrated remarkable potential in integrating visual and linguistic information, but their performance is often constrained by the need for extensive, high-quality image-text training data. Curation of…

Computer Vision and Pattern Recognition · Computer Science 2025-05-20 Giorgio Giannone , Ruoteng Li , Qianli Feng , Evgeny Perevodchikov , Rui Chen , Aleix Martinez

Large vision-language models (LVLMs) have shown premise in a broad range of vision-language tasks with their strong reasoning and generalization capabilities. However, they require considerable computational resources for training and…

Computation and Language · Computer Science 2024-06-18 Guiming Hardy Chen , Shunian Chen , Ruifei Zhang , Junying Chen , Xiangbo Wu , Zhiyi Zhang , Zhihong Chen , Jianquan Li , Xiang Wan , Benyou Wang

Preference finetuning methods like Direct Preference Optimization (DPO) with AI-generated feedback have shown promise in aligning Large Vision-Language Models (LVLMs) with human preferences. However, existing techniques overlook the…

Artificial Intelligence · Computer Science 2025-10-03 Rohan Wadhawan , Fabrice Y Harel-Canada , Zi-Yi Dou , Suhaila Shakiah , Robinson Piramuthu , Nanyun Peng

Instruction-following Vision Large Language Models (VLLMs) have achieved significant progress recently on a variety of tasks. These approaches merge strong pre-trained vision models and large language models (LLMs). Since these components…

Machine Learning · Computer Science 2024-02-20 Yiyang Zhou , Chenhang Cui , Rafael Rafailov , Chelsea Finn , Huaxiu Yao

Inspired by text prompts in large language models, visual prompts have been explored to enhance the perceptual capabilities of large vision-language models (LVLMs). However, performance tends to saturate under single visual prompt designs,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-06 Yuan Zhang , Chun-Kai Fan , Sicheng Yu , Junwen Pan , Tao Huang , Ming Lu , Kuan Cheng , Qi She , Shanghang Zhang

Human preference alignment can greatly enhance Multimodal Large Language Models (MLLMs), but collecting high-quality preference data is costly. A promising solution is the self-evolution strategy, where models are iteratively trained on…

Machine Learning · Computer Science 2024-12-23 Wentao Tan , Qiong Cao , Yibing Zhan , Chao Xue , Changxing Ding

Vision-language models (VLMs) have made significant progress in image classification by training with large-scale paired image-text data. Their performances largely depend on the prompt quality. While recent methods show that visual…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Xiangyan Qu , Gaopeng Gou , Jiamin Zhuang , Jing Yu , Kun Song , Qihao Wang , Yili Li , Gang Xiong

We present SelfPrompt, a novel prompt-tuning approach for vision-language models (VLMs) in a semi-supervised learning setup. Existing methods for tuning VLMs in semi-supervised setups struggle with the negative impact of the miscalibrated…

Computer Vision and Pattern Recognition · Computer Science 2025-01-30 Shuvendu Roy , Ali Etemad

The emergence of large Vision Language Models (VLMs) has broadened the scope and capabilities of single-modal Large Language Models (LLMs) by integrating visual modalities, thereby unlocking transformative cross-modal applications in a…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Shuo Xing , Peiran Li , Yuping Wang , Ruizheng Bai , Yueqi Wang , Chan-Wei Hu , Chengxuan Qian , Huaxiu Yao , Zhengzhong Tu

Data selection in instruction tuning emerges as a pivotal process for acquiring high-quality data and training instruction-following large language models (LLMs), but it is still a new and unexplored research area for vision-language models…

Computation and Language · Computer Science 2024-02-21 Ruibo Chen , Yihan Wu , Lichang Chen , Guodong Liu , Qi He , Tianyi Xiong , Chenxi Liu , Junfeng Guo , Heng Huang

The development of large vision-language models (LVLMs) offers the potential to address challenges faced by traditional multimodal recommendations thanks to their proficient understanding of static images and textual dynamics. However, the…

Artificial Intelligence · Computer Science 2024-02-14 Yuqing Liu , Yu Wang , Lichao Sun , Philip S. Yu

In this paper, we introduce MultiviewVLM, a vision-language model designed for unsupervised contrastive multiview representation learning of facial emotions from 3D/4D data. Our architecture integrates pseudo-labels derived from generated…

Computer Vision and Pattern Recognition · Computer Science 2025-05-15 Muzammil Behzad

Large Vision-Language Models (LVLMs) typically follow a two-stage training paradigm-pretraining and supervised fine-tuning. Recently, preference optimization, derived from the language domain, has emerged as an effective post-training…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Yufei Zhan , Yousong Zhu , Shurong Zheng , Hongyin Zhao , Fan Yang , Ming Tang , Jinqiao Wang
‹ Prev 1 2 3 10 Next ›