Related papers: Parameter-Efficient Tuning Helps Language Model Al…

Optimizing LLMs with Direct Preferences: A Data Efficiency Perspective

Aligning the output of Large Language Models (LLMs) with human preferences (e.g., by means of reinforcement learning with human feedback, or RLHF) is essential for ensuring their effectiveness in real-world scenarios. Despite significant…

Artificial Intelligence · Computer Science 2024-10-23 Pietro Bernardelle , Gianluca Demartini

Aligning Large Language Models via Fine-grained Supervision

Pre-trained large-scale language models (LLMs) excel at producing coherent articles, yet their outputs may be untruthful, toxic, or fail to align with user expectations. Current approaches focus on using reinforcement learning with human…

Computation and Language · Computer Science 2024-06-06 Dehong Xu , Liang Qiu , Minseok Kim , Faisal Ladhak , Jaeyoung Do

MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time

Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora, making them powerful tools for various applications. To make LLMs more usable, aligning them with human preferences is essential.…

Computation and Language · Computer Science 2024-10-21 Mozhi Zhang , Pengyu Wang , Chenkun Tan , Mianqiu Huang , Dong Zhang , Yaqian Zhou , Xipeng Qiu

Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier

For aligning large language models (LLMs), prior work has leveraged reinforcement learning via human feedback (RLHF) or variations of direct preference optimization (DPO). While DPO offers a simpler framework based on maximum likelihood…

Artificial Intelligence · Computer Science 2025-05-27 Anirudhan Badrinath , Prabhat Agarwal , Jiajing Xu

Preference-Oriented Supervised Fine-Tuning: Favoring Target Model Over Aligned Large Language Models

Alignment, endowing a pre-trained Large language model (LLM) with the ability to follow instructions, is crucial for its real-world applications. Conventional supervised fine-tuning (SFT) methods formalize it as causal language modeling…

Computation and Language · Computer Science 2024-12-18 Yuchen Fan , Yuzhong Hong , Qiushi Wang , Junwei Bao , Hongfei Jiang , Yang Song

One Model for All: Multi-Objective Controllable Language Models

Aligning large language models (LLMs) with human preferences is critical for enhancing LLMs' safety, helpfulness, humor, faithfulness, etc. Current reinforcement learning from human feedback (RLHF) mainly focuses on a fixed reward learned…

Machine Learning · Computer Science 2026-04-07 Qiang He , Yucheng Yang , Tianyi Zhou , Meng Fang , Mykola Pechenizkiy , Setareh Maghsudi

PAFT: A Parallel Training Paradigm for Effective LLM Fine-Tuning

Large language models (LLMs) have shown remarkable abilities in diverse natural language processing (NLP) tasks. The LLMs generally undergo supervised fine-tuning (SFT) followed by preference alignment to be usable in downstream…

Computation and Language · Computer Science 2024-06-27 Shiva Kumar Pentyala , Zhichao Wang , Bin Bi , Kiran Ramnath , Xiang-Bo Mao , Regunathan Radhakrishnan , Sitaram Asur , Na , Cheng

Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model

Large Language Models (LLMs) have become increasingly popular due to their ability to process and generate natural language. However, as they are trained on massive datasets of text, LLMs can inherit harmful biases and produce outputs that…

Computation and Language · Computer Science 2025-01-23 Qi Gou , Cam-Tu Nguyen

A Deep Dive into the Trade-Offs of Parameter-Efficient Preference Alignment Techniques

Large language models are first pre-trained on trillions of tokens and then instruction-tuned or aligned to specific preferences. While pre-training remains out of reach for most researchers due to the compute required, fine-tuning has…

Computation and Language · Computer Science 2024-06-10 Megh Thakkar , Quentin Fournier , Matthew D Riemer , Pin-Yu Chen , Amal Zouaq , Payel Das , Sarath Chandar

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing…

Machine Learning · Computer Science 2024-07-31 Rafael Rafailov , Archit Sharma , Eric Mitchell , Stefano Ermon , Christopher D. Manning , Chelsea Finn

Preference-grounded Token-level Guidance for Language Model Fine-tuning

Aligning language models (LMs) with preferences is an important problem in natural language generation. A key challenge is that preferences are typically provided at the sequence level while LM training and generation both occur at the…

Computation and Language · Computer Science 2025-01-09 Shentao Yang , Shujian Zhang , Congying Xia , Yihao Feng , Caiming Xiong , Mingyuan Zhou

ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise Preference

Aligning language models to human expectations, e.g., being helpful and harmless, has become a pressing challenge for large language models. A typical alignment procedure consists of supervised fine-tuning and preference learning. Most…

Machine Learning · Computer Science 2024-02-27 Tianchi Cai , Xierui Song , Jiyan Jiang , Fei Teng , Jinjie Gu , Guannan Zhang

Orchestrating LLMs with Different Personalizations

This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF). Given stated preferences…

Artificial Intelligence · Computer Science 2024-07-08 Jin Peng Zhou , Katie Z Luo , Jingwen Gu , Jason Yuan , Kilian Q. Weinberger , Wen Sun

Towards Improved Preference Optimization Pipeline: from Data Generation to Budget-Controlled Regularization

Direct Preference Optimization (DPO) and its variants have become the de facto standards for aligning large language models (LLMs) with human preferences or specific goals. However, DPO requires high-quality preference data and suffers from…

Machine Learning · Computer Science 2024-11-12 Zhuotong Chen , Fang Liu , Jennifer Zhu , Wanyu Du , Yanjun Qi

Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment

The alignment of large language models (LLMs) with human values is critical as these models become increasingly integrated into various societal and decision-making processes. Traditional methods, such as reinforcement learning from human…

Machine Learning · Computer Science 2025-01-08 Prashant Trivedi , Souradip Chakraborty , Avinash Reddy , Vaneet Aggarwal , Amrit Singh Bedi , George K. Atia

Data Selection for LLM Alignment Using Fine-Grained Preferences

Large language models (LLMs) alignment aims to ensure that the behavior of LLMs meets human preferences. While collecting data from multiple fine-grained, aspect-specific preferences becomes more and more feasible, existing alignment…

Machine Learning · Computer Science 2026-03-03 Jia Zhang , Yao Liu , Chen-Xi Zhang , Yi Liu , Yi-Xuan Jin , Lan-Zhe Guo , Yu-Feng Li

OPTune: Efficient Online Preference Tuning

Reinforcement learning with human feedback~(RLHF) is critical for aligning Large Language Models (LLMs) with human preference. Compared to the widely studied offline version of RLHF, \emph{e.g.} direct preference optimization (DPO), recent…

Machine Learning · Computer Science 2024-06-13 Lichang Chen , Jiuhai Chen , Chenxi Liu , John Kirchenbauer , Davit Soselia , Chen Zhu , Tom Goldstein , Tianyi Zhou , Heng Huang

Preference Learning Algorithms Do Not Learn Preference Rankings

Preference learning algorithms (e.g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited. In this work, we study the…

Machine Learning · Computer Science 2024-11-01 Angelica Chen , Sadhika Malladi , Lily H. Zhang , Xinyi Chen , Qiuyi Zhang , Rajesh Ranganath , Kyunghyun Cho

Aligning Large Language Models with Human Preferences through Representation Engineering

Aligning large language models (LLMs) with human preferences is crucial for enhancing their utility in terms of helpfulness, truthfulness, safety, harmlessness, and interestingness. Existing methods for achieving this alignment often…

Computation and Language · Computer Science 2024-07-04 Wenhao Liu , Xiaohua Wang , Muling Wu , Tianlong Li , Changze Lv , Zixuan Ling , Jianhao Zhu , Cenyuan Zhang , Xiaoqing Zheng , Xuanjing Huang

Larger or Smaller Reward Margins to Select Preferences for Alignment?

Preference learning is critical for aligning large language models (LLMs) with human values, with the quality of preference datasets playing a crucial role in this process. While existing metrics primarily assess data quality based on…

Machine Learning · Computer Science 2025-03-05 Kexin Huang , Junkang Wu , Ziqian Chen , Xue Wang , Jinyang Gao , Bolin Ding , Jiancan Wu , Xiangnan He , Xiang Wang