Related papers: Preference-aware Influence-function-based Data Sel…

Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models

Alignment, endowing a pre-trained Large language model (LLM) with the ability to follow instructions, is crucial for its real-world applications. Conventional supervised fine-tuning (SFT) methods formalize it as causal language modeling…

Computation and Language · Computer Science 2024-12-18 Yuchen Fan , Yuzhong Hong , Qiushi Wang , Junwei Bao , Hongfei Jiang , Yang Song

Data-efficient Fine-tuning for LLM-based Recommendation

Leveraging Large Language Models (LLMs) for recommendation has recently garnered considerable attention, where fine-tuning plays a key role in LLMs' adaptation. However, the cost of fine-tuning LLMs on rapidly expanding recommendation data…

Information Retrieval · Computer Science 2024-06-05 Xinyu Lin , Wenjie Wang , Yongqi Li , Shuo Yang , Fuli Feng , Yinwei Wei , Tat-Seng Chua

PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection

Visual instruction tuning adapts pre-trained Multimodal Large Language Models (MLLMs) to follow human instructions for real-world applications. However, the rapid growth of these datasets introduces significant redundancy, leading to…

Computer Vision and Pattern Recognition · Computer Science 2026-01-14 Jinhe Bi , Aniri , Yifan Wang , Danqi Yan , Wenke Huang , Zengjie Jin , Xiaowen Ma , Sikuan Yan , Artur Hecker , Mang Ye , Xun Xiao , Hinrich Schuetze , Volker Tresp , Yunpu Ma

Efficient Data Selection at Scale via Influence Distillation

Effective data selection is critical for efficient training of modern Large Language Models (LLMs). This paper introduces Influence Distillation, a novel, mathematically-justified framework for data selection that employs second-order…

Computation and Language · Computer Science 2025-05-27 Mahdi Nikdan , Vincent Cohen-Addad , Dan Alistarh , Vahab Mirrokni

$\textbf{PLUM}$: Improving Code LMs with Execution-Guided On-Policy Preference Learning Driven By Synthetic Test Cases

Preference learning provides a promising solution to address the limitations of supervised fine-tuning (SFT) for code language models, where the model is not explicitly trained to differentiate between correct and incorrect code. Recent…

Computation and Language · Computer Science 2024-10-15 Dylan Zhang , Shizhe Diao , Xueyan Zou , Hao Peng

Data Selection for LLM Alignment Using Fine-Grained Preferences

Large language models (LLMs) alignment aims to ensure that the behavior of LLMs meets human preferences. While collecting data from multiple fine-grained, aspect-specific preferences becomes more and more feasible, existing alignment…

Machine Learning · Computer Science 2026-03-03 Jia Zhang , Yao Liu , Chen-Xi Zhang , Yi Liu , Yi-Xuan Jin , Lan-Zhe Guo , Yu-Feng Li

Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs

This work focuses on leveraging and selecting from vast, unlabeled, open data to pre-fine-tune a pre-trained language model. The goal is to minimize the need for costly domain-specific data for subsequent fine-tuning while achieving desired…

Machine Learning · Computer Science 2024-05-07 Feiyang Kang , Hoang Anh Just , Yifan Sun , Himanshu Jahagirdar , Yuanzhi Zhang , Rongxing Du , Anit Kumar Sahu , Ruoxi Jia

Pref-CTRL: Preference Driven LLM Alignment using Representation Editing

Test-time alignment methods offer a promising alternative to fine-tuning by steering the outputs of large language models (LLMs) at inference time with lightweight interventions on their internal representations. Recently, a prominent and…

Computation and Language · Computer Science 2026-04-28 Imranul Ashrafi , Inigo Jauregi Unanue , Massimo Piccardi

PRISM: A Rich Class of Parameterized Submodular Information Measures for Guided Subset Selection

With ever-increasing dataset sizes, subset selection techniques are becoming increasingly important for a plethora of tasks. It is often necessary to guide the subset selection to achieve certain desiderata, which includes focusing or…

Computer Vision and Pattern Recognition · Computer Science 2022-03-10 Suraj Kothawade , Vishal Kaushal , Ganesh Ramakrishnan , Jeff Bilmes , Rishabh Iyer

Preference VLM: Leveraging VLMs for Scalable Preference-Based Reinforcement Learning

Preference-based reinforcement learning (RL) offers a promising approach for aligning policies with human intent but is often constrained by the high cost of human feedback. In this work, we introduce PrefVLM, a framework that integrates…

Machine Learning · Computer Science 2025-02-04 Udita Ghosh , Dripta S. Raychaudhuri , Jiachen Li , Konstantinos Karydis , Amit Roy-Chowdhury

Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration

While Hybrid Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has become the standard paradigm for training LLM agents, effective mechanisms for data allocation between these stages remain largely underexplored. Current…

Artificial Intelligence · Computer Science 2026-04-14 Yang Zhao , Yangou Ouyang , Xiao Ding , Hepeng Wang , Bibo Cai , Kai Xiong , Jinglong Gao , Zhouhao Sun , Li Du , Bing Qin , Ting Liu

Preference Alignment with Flow Matching

We present Preference Flow Matching (PFM), a new framework for preference-based reinforcement learning (PbRL) that streamlines the integration of preferences into an arbitrary class of pre-trained models. Existing PbRL methods require…

Machine Learning · Computer Science 2024-10-29 Minu Kim , Yongsik Lee , Sehyeok Kang , Jihwan Oh , Song Chong , Se-Young Yun

PRISM: Preference Refinement via Implicit Scene Modeling for 3D Vision-Language Preference-Based Reinforcement Learning

We propose PRISM, a novel framework designed to overcome the limitations of 2D-based Preference-Based Reinforcement Learning (PBRL) by unifying 3D point cloud modeling and future-aware preference refinement. At its core, PRISM adopts a 3D…

Computation and Language · Computer Science 2025-03-20 Yirong Sun , Yanjun Chen

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Learning from preference labels plays a crucial role in fine-tuning large language models. There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and…

Machine Learning · Computer Science 2024-06-04 Fahim Tajwar , Anikait Singh , Archit Sharma , Rafael Rafailov , Jeff Schneider , Tengyang Xie , Stefano Ermon , Chelsea Finn , Aviral Kumar

Influence Functions for Preference Dataset Pruning

Language models are commonly fine-tuned via reinforcement learning to alter their behavior or elicit new capabilities. Datasets used for these purposes, and particularly human preference datasets, are often noisy. The relatively small size…

Machine Learning · Computer Science 2025-07-22 Daniel Fein , Gabriela Aranguiz-Dias

RL-Guided Data Selection for Language Model Finetuning

Data selection for finetuning Large Language Models (LLMs) can be framed as a budget-constrained optimization problem: maximizing a model's downstream performance under a strict training data budget. Solving this problem is generally…

Machine Learning · Computer Science 2025-10-01 Animesh Jha , Harshit Gupta , Ananjan Nandi

LLM Data Selection and Utilization via Dynamic Bi-level Optimization

While large-scale training data is fundamental for developing capable large language models (LLMs), strategically selecting high-quality data has emerged as a critical approach to enhance training efficiency and reduce computational costs.…

Machine Learning · Computer Science 2025-07-23 Yang Yu , Kai Han , Hang Zhou , Yehui Tang , Kaiqi Huang , Yunhe Wang , Dacheng Tao

Towards Understanding Valuable Preference Data for Large Language Model Alignment

Large language model (LLM) alignment is typically achieved through learning from human preference comparisons, making the quality of preference data critical to its success. Existing studies often pre-process raw training datasets to…

Machine Learning · Computer Science 2026-03-17 Zizhuo Zhang , Qizhou Wang , Shanshan Ye , Jianing Zhu , Jiangchao Yao , Bo Han , Masashi Sugiyama

Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities

Selecting appropriate training data is crucial for effective instruction fine-tuning of large language models (LLMs), which aims to (1) elicit strong capabilities, and (2) achieve balanced performance across a diverse range of tasks.…

Computation and Language · Computer Science 2025-01-22 Qirun Dai , Dylan Zhang , Jiaqi W. Ma , Hao Peng

BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining

Effective data selection is essential for pretraining large language models (LLMs), enhancing efficiency and improving generalization to downstream tasks. However, existing approaches often require leveraging external pretrained models,…

Machine Learning · Computer Science 2026-02-04 Jie Hao , Rui Yu , Wei Zhang , Huixia Wang , Jie Xu , Mingrui Liu