Related papers: Retrieval-Augmented Fine-Tuning With Preference Op…

RPO: Fine-Tuning Visual Generative Models via Rich Vision-Language Preferences

Traditional preference tuning methods for LLMs/Visual Generative Models often rely solely on reward model labeling, which can be opaque, offer limited insights into the rationale behind preferences, and are prone to issues such as reward…

Machine Learning · Computer Science 2026-01-13 Hanyang Zhao , Haoxian Chen , Yucheng Guo , Genta Indra Winata , Tingting Ou , Ziyu Huang , David D. Yao , Wenpin Tang

Aligning Large Vision-Language Models by Deep Reinforcement Learning and Direct Preference Optimization

Large Vision-Language Models (LVLMs) or multimodal large language models represent a significant advancement in artificial intelligence, enabling systems to understand and generate content across both visual and textual modalities. While…

Machine Learning · Computer Science 2025-09-09 Thanh Thi Nguyen , Campbell Wilson , Janis Dalins

Aligning CodeLLMs with Direct Preference Optimization

The last year has witnessed the rapid progress of large language models (LLMs) across diverse domains. Among them, CodeLLMs have garnered particular attention because they can not only assist in completing various programming tasks but also…

Artificial Intelligence · Computer Science 2024-10-25 Yibo Miao , Bofei Gao , Shanghaoran Quan , Junyang Lin , Daoguang Zan , Jiaheng Liu , Jian Yang , Tianyu Liu , Zhijie Deng

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization

The emergence of large Vision Language Models (VLMs) has broadened the scope and capabilities of single-modal Large Language Models (LLMs) by integrating visual modalities, thereby unlocking transformative cross-modal applications in a…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Shuo Xing , Peiran Li , Yuping Wang , Ruizheng Bai , Yueqi Wang , Chan-Wei Hu , Chengxuan Qian , Huaxiu Yao , Zhengzhong Tu

Differentiable Prompt Learning for Vision Language Models

Prompt learning is an effective way to exploit the potential of large-scale pre-trained foundational models. Continuous prompts parameterize context tokens in prompts by turning them into differentiable vectors. Deep continuous prompts…

Machine Learning · Computer Science 2025-01-03 Zhenhan Huang , Tejaswini Pedapati , Pin-Yu Chen , Jianxi Gao

Multi-modal Preference Alignment Remedies Degradation of Visual Instruction Tuning on Language Models

Multi-modal large language models (MLLMs) are expected to support multi-turn queries of interchanging image and text modalities in production. However, the current MLLMs trained with visual-question-answering (VQA) datasets could suffer…

Computation and Language · Computer Science 2024-11-06 Shengzhi Li , Rongyu Lin , Shichao Pei

Optimizing LLMs with Direct Preferences: A Data Efficiency Perspective

Aligning the output of Large Language Models (LLMs) with human preferences (e.g., by means of reinforcement learning with human feedback, or RLHF) is essential for ensuring their effectiveness in real-world scenarios. Despite significant…

Artificial Intelligence · Computer Science 2024-10-23 Pietro Bernardelle , Gianluca Demartini

M3PO: Multimodal-Model-Guided Preference Optimization for Visual Instruction Following

Large Vision-Language Models (LVLMs) hold immense potential for complex multimodal instruction following, yet their development is often hindered by the high cost and inconsistency of human annotation required for effective fine-tuning and…

Computation and Language · Computer Science 2025-08-19 Ruirui Gao , Emily Johnson , Bowen Tan , Yanfei Qian

VL-DPO: Vision-Language-Guided Finetuning for Preference-Aligned Autonomous Driving

The rapid growth of autonomous driving datasets has enabled the scaling of powerful motion forecasting models. While large-scale pretraining provides strong performance, the standard imitation objective may not fully capture the complex…

Computer Vision and Pattern Recognition · Computer Science 2026-05-20 Zhefan Xu , Ghassen Jerfel , Marina Haliem , Qi Zhao , Jeonhyung Kang , Khaled S. Refaat

On the Role of Preference Variance in Preference Optimization

Direct Preference Optimization (DPO) has emerged as an important approach for learning from human preferences in aligning large language models (LLMs). However, collecting human preference data is costly and inefficient, motivating methods…

Computation and Language · Computer Science 2025-12-01 Jiacheng Guo , Zihao Li , Jiahao Qiu , Yue Wu , Mengdi Wang

ProAPO: Progressively Automatic Prompt Optimization for Visual Classification

Vision-language models (VLMs) have made significant progress in image classification by training with large-scale paired image-text data. Their performances largely depend on the prompt quality. While recent methods show that visual…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Xiangyan Qu , Gaopeng Gou , Jiamin Zhuang , Jing Yu , Kun Song , Qihao Wang , Yili Li , Gang Xiong

VLP: Vision-Language Preference Learning for Embodied Manipulation

Reward engineering is one of the key challenges in Reinforcement Learning (RL). Preference-based RL effectively addresses this issue by learning from human feedback. However, it is both time-consuming and expensive to collect human…

Machine Learning · Computer Science 2025-02-18 Runze Liu , Chenjia Bai , Jiafei Lyu , Shengjie Sun , Yali Du , Xiu Li

Parameter-Efficient Tuning Helps Language Model Alignment

Aligning large language models (LLMs) with human preferences is essential for safe and useful LLMs. Previous works mainly adopt reinforcement learning (RLHF) and direct preference optimization (DPO) with human feedback for alignment.…

Computation and Language · Computer Science 2023-10-03 Tianci Xue , Ziqi Wang , Heng Ji

Preference Packing: Efficient Preference Optimization for Large Language Models

Resource-efficient training optimization techniques are becoming increasingly important as the size of large language models (LLMs) continues to grow. In particular, batch packing is commonly used in pre-training and supervised fine-tuning…

Computation and Language · Computer Science 2026-03-02 Jaekyung Cho

Towards Improved Preference Optimization Pipeline: from Data Generation to Budget-Controlled Regularization

Direct Preference Optimization (DPO) and its variants have become the de facto standards for aligning large language models (LLMs) with human preferences or specific goals. However, DPO requires high-quality preference data and suffers from…

Machine Learning · Computer Science 2024-11-12 Zhuotong Chen , Fang Liu , Jennifer Zhu , Wanyu Du , Yanjun Qi

Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning

Preference-based reinforcement learning (RL) offers a promising approach for aligning policies with human intent but is often constrained by the high cost of human feedback. In this work, we introduce PrefVLM, a framework that integrates…

Machine Learning · Computer Science 2025-02-04 Udita Ghosh , Dripta S. Raychaudhuri , Jiachen Li , Konstantinos Karydis , Amit Roy-Chowdhury

Toward Preference-aligned Large Language Models via Residual-based Model Steering

Preference alignment is a critical step in making Large Language Models (LLMs) useful and aligned with (human) preferences. Existing approaches such as Reinforcement Learning from Human Feedback or Direct Preference Optimization typically…

Computation and Language · Computer Science 2025-09-30 Lucio La Cava , Andrea Tagarelli

Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning

Effective training of language models (LMs) for mathematical reasoning tasks demands high-quality supervised fine-tuning data. Besides obtaining annotations from human experts, a common alternative is sampling from larger and more powerful…

Computation and Language · Computer Science 2024-07-26 Tianduo Wang , Shichen Li , Wei Lu

VaPR -- Vision-language Preference alignment for Reasoning

Preference finetuning methods like Direct Preference Optimization (DPO) with AI-generated feedback have shown promise in aligning Large Vision-Language Models (LVLMs) with human preferences. However, existing techniques overlook the…

Artificial Intelligence · Computer Science 2025-10-03 Rohan Wadhawan , Fabrice Y Harel-Canada , Zi-Yi Dou , Suhaila Shakiah , Robinson Piramuthu , Nanyun Peng

Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers

Automatic prompt optimization is an important approach to improving the performance of large language models (LLMs). Recent research demonstrates the potential of using LLMs as prompt optimizers, which can generate improved task prompts via…

Computation and Language · Computer Science 2025-01-28 Xinyu Tang , Xiaolei Wang , Wayne Xin Zhao , Siyuan Lu , Yaliang Li , Ji-Rong Wen