English
Related papers

Related papers: Retrieval-Augmented Fine-Tuning With Preference Op…

200 papers

Traditional preference tuning methods for LLMs/Visual Generative Models often rely solely on reward model labeling, which can be opaque, offer limited insights into the rationale behind preferences, and are prone to issues such as reward…

Machine Learning · Computer Science 2026-01-13 Hanyang Zhao , Haoxian Chen , Yucheng Guo , Genta Indra Winata , Tingting Ou , Ziyu Huang , David D. Yao , Wenpin Tang

Large Vision-Language Models (LVLMs) or multimodal large language models represent a significant advancement in artificial intelligence, enabling systems to understand and generate content across both visual and textual modalities. While…

Machine Learning · Computer Science 2025-09-09 Thanh Thi Nguyen , Campbell Wilson , Janis Dalins

The last year has witnessed the rapid progress of large language models (LLMs) across diverse domains. Among them, CodeLLMs have garnered particular attention because they can not only assist in completing various programming tasks but also…

Artificial Intelligence · Computer Science 2024-10-25 Yibo Miao , Bofei Gao , Shanghaoran Quan , Junyang Lin , Daoguang Zan , Jiaheng Liu , Jian Yang , Tianyu Liu , Zhijie Deng

The emergence of large Vision Language Models (VLMs) has broadened the scope and capabilities of single-modal Large Language Models (LLMs) by integrating visual modalities, thereby unlocking transformative cross-modal applications in a…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Shuo Xing , Peiran Li , Yuping Wang , Ruizheng Bai , Yueqi Wang , Chan-Wei Hu , Chengxuan Qian , Huaxiu Yao , Zhengzhong Tu

Prompt learning is an effective way to exploit the potential of large-scale pre-trained foundational models. Continuous prompts parameterize context tokens in prompts by turning them into differentiable vectors. Deep continuous prompts…

Machine Learning · Computer Science 2025-01-03 Zhenhan Huang , Tejaswini Pedapati , Pin-Yu Chen , Jianxi Gao

Multi-modal large language models (MLLMs) are expected to support multi-turn queries of interchanging image and text modalities in production. However, the current MLLMs trained with visual-question-answering (VQA) datasets could suffer…

Computation and Language · Computer Science 2024-11-06 Shengzhi Li , Rongyu Lin , Shichao Pei

Aligning the output of Large Language Models (LLMs) with human preferences (e.g., by means of reinforcement learning with human feedback, or RLHF) is essential for ensuring their effectiveness in real-world scenarios. Despite significant…

Artificial Intelligence · Computer Science 2024-10-23 Pietro Bernardelle , Gianluca Demartini

Large Vision-Language Models (LVLMs) hold immense potential for complex multimodal instruction following, yet their development is often hindered by the high cost and inconsistency of human annotation required for effective fine-tuning and…

Computation and Language · Computer Science 2025-08-19 Ruirui Gao , Emily Johnson , Bowen Tan , Yanfei Qian

The rapid growth of autonomous driving datasets has enabled the scaling of powerful motion forecasting models. While large-scale pretraining provides strong performance, the standard imitation objective may not fully capture the complex…

Computer Vision and Pattern Recognition · Computer Science 2026-05-20 Zhefan Xu , Ghassen Jerfel , Marina Haliem , Qi Zhao , Jeonhyung Kang , Khaled S. Refaat

Direct Preference Optimization (DPO) has emerged as an important approach for learning from human preferences in aligning large language models (LLMs). However, collecting human preference data is costly and inefficient, motivating methods…

Computation and Language · Computer Science 2025-12-01 Jiacheng Guo , Zihao Li , Jiahao Qiu , Yue Wu , Mengdi Wang

Vision-language models (VLMs) have made significant progress in image classification by training with large-scale paired image-text data. Their performances largely depend on the prompt quality. While recent methods show that visual…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Xiangyan Qu , Gaopeng Gou , Jiamin Zhuang , Jing Yu , Kun Song , Qihao Wang , Yili Li , Gang Xiong

Reward engineering is one of the key challenges in Reinforcement Learning (RL). Preference-based RL effectively addresses this issue by learning from human feedback. However, it is both time-consuming and expensive to collect human…

Machine Learning · Computer Science 2025-02-18 Runze Liu , Chenjia Bai , Jiafei Lyu , Shengjie Sun , Yali Du , Xiu Li

Aligning large language models (LLMs) with human preferences is essential for safe and useful LLMs. Previous works mainly adopt reinforcement learning (RLHF) and direct preference optimization (DPO) with human feedback for alignment.…

Computation and Language · Computer Science 2023-10-03 Tianci Xue , Ziqi Wang , Heng Ji

Resource-efficient training optimization techniques are becoming increasingly important as the size of large language models (LLMs) continues to grow. In particular, batch packing is commonly used in pre-training and supervised fine-tuning…

Computation and Language · Computer Science 2026-03-02 Jaekyung Cho

Direct Preference Optimization (DPO) and its variants have become the de facto standards for aligning large language models (LLMs) with human preferences or specific goals. However, DPO requires high-quality preference data and suffers from…

Machine Learning · Computer Science 2024-11-12 Zhuotong Chen , Fang Liu , Jennifer Zhu , Wanyu Du , Yanjun Qi

Preference-based reinforcement learning (RL) offers a promising approach for aligning policies with human intent but is often constrained by the high cost of human feedback. In this work, we introduce PrefVLM, a framework that integrates…

Machine Learning · Computer Science 2025-02-04 Udita Ghosh , Dripta S. Raychaudhuri , Jiachen Li , Konstantinos Karydis , Amit Roy-Chowdhury

Preference alignment is a critical step in making Large Language Models (LLMs) useful and aligned with (human) preferences. Existing approaches such as Reinforcement Learning from Human Feedback or Direct Preference Optimization typically…

Computation and Language · Computer Science 2025-09-30 Lucio La Cava , Andrea Tagarelli

Effective training of language models (LMs) for mathematical reasoning tasks demands high-quality supervised fine-tuning data. Besides obtaining annotations from human experts, a common alternative is sampling from larger and more powerful…

Computation and Language · Computer Science 2024-07-26 Tianduo Wang , Shichen Li , Wei Lu

Preference finetuning methods like Direct Preference Optimization (DPO) with AI-generated feedback have shown promise in aligning Large Vision-Language Models (LVLMs) with human preferences. However, existing techniques overlook the…

Artificial Intelligence · Computer Science 2025-10-03 Rohan Wadhawan , Fabrice Y Harel-Canada , Zi-Yi Dou , Suhaila Shakiah , Robinson Piramuthu , Nanyun Peng

Automatic prompt optimization is an important approach to improving the performance of large language models (LLMs). Recent research demonstrates the potential of using LLMs as prompt optimizers, which can generate improved task prompts via…

Computation and Language · Computer Science 2025-01-28 Xinyu Tang , Xiaolei Wang , Wayne Xin Zhao , Siyuan Lu , Yaliang Li , Ji-Rong Wen
‹ Prev 1 2 3 10 Next ›