Related papers: Drift: Decoding-time Personalized Alignments with …

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by…

Computation and Language · Computer Science 2024-07-03 Songyang Gao , Qiming Ge , Wei Shen , Shihan Dou , Junjie Ye , Xiao Wang , Rui Zheng , Yicheng Zou , Zhi Chen , Hang Yan , Qi Zhang , Dahua Lin

DEFT: Distribution-guided Efficient Fine-Tuning for Human Alignment

Reinforcement Learning from Human Feedback (RLHF), using algorithms like Proximal Policy Optimization (PPO), aligns Large Language Models (LLMs) with human values but is costly and unstable. Alternatives have been proposed to replace PPO or…

Computation and Language · Computer Science 2026-04-03 Liang Zhu , Feiteng Fang , Yuelin Bai , Longze Chen , Zhexiang Zhang , Minghuan Tan , Min Yang

RLTHF: Targeted Human Feedback for LLM Alignment

Fine-tuning large language models (LLMs) to align with user preferences is challenging due to the high cost of quality human annotations in Reinforcement Learning from Human Feedback (RLHF) and the generalizability limitations of AI…

Computation and Language · Computer Science 2025-08-08 Yifei Xu , Tusher Chakraborty , Emre Kıcıman , Bibek Aryal , Eduardo Rodrigues , Srinagesh Sharma , Roberto Estevao , Maria Angels de Luis Balaguer , Jessica Wolk , Rafael Padilha , Leonardo Nunes , Shobana Balakrishnan , Songwu Lu , Ranveer Chandra

Language Model Personalization via Reward Factorization

Modern large language models (LLMs) are optimized for human-aligned responses using Reinforcement Learning from Human Feedback (RLHF). However, existing RLHF approaches assume a universal preference model and fail to account for individual…

Machine Learning · Computer Science 2025-03-11 Idan Shenfeld , Felix Faltings , Pulkit Agrawal , Aldo Pacchiano

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

Personalizing large language models (LLMs) to accommodate diverse user preferences is essential for enhancing alignment and user satisfaction. Traditional reinforcement learning from human feedback (RLHF) approaches often rely on monolithic…

Machine Learning · Computer Science 2025-04-22 Avinandan Bose , Zhihan Xiong , Yuejie Chi , Simon Shaolei Du , Lin Xiao , Maryam Fazel

MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time

Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora, making them powerful tools for various applications. To make LLMs more usable, aligning them with human preferences is essential.…

Computation and Language · Computer Science 2024-10-21 Mozhi Zhang , Pengyu Wang , Chenkun Tan , Mianqiu Huang , Dong Zhang , Yaqian Zhou , Xipeng Qiu

SAIL: Self-Improving Efficient Online Alignment of Large Language Models

Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference…

Machine Learning · Computer Science 2024-06-25 Mucong Ding , Souradip Chakraborty , Vibhu Agrawal , Zora Che , Alec Koppel , Mengdi Wang , Amrit Bedi , Furong Huang

Personalized Language Modeling from Personalized Human Feedback

Personalized large language models (LLMs) are designed to tailor responses to individual user preferences. While Reinforcement Learning from Human Feedback (RLHF) is a commonly used framework for aligning LLMs with human preferences,…

Computation and Language · Computer Science 2024-12-10 Xinyu Li , Ruiyang Zhou , Zachary C. Lipton , Liu Leqi

Orchestrating LLMs with Different Personalizations

This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF). Given stated preferences…

Artificial Intelligence · Computer Science 2024-07-08 Jin Peng Zhou , Katie Z Luo , Jingwen Gu , Jason Yuan , Kilian Q. Weinberger , Wen Sun

SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF

Model alignment with human preferences is an essential step in making Large Language Models (LLMs) helpful and consistent with human values. It typically consists of supervised fine-tuning (SFT) and reinforcement learning from human…

Computation and Language · Computer Science 2023-10-10 Yi Dong , Zhilin Wang , Makesh Narsimhan Sreedhar , Xianchao Wu , Oleksii Kuchaiev

When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning

While Reinforcement Learning from Human Feedback (RLHF) is widely used to align Large Language Models (LLMs) with human preferences, it typically assumes homogeneous preferences across users, overlooking diverse human values and minority…

Computation and Language · Computer Science 2025-10-28 Yijiang River Dong , Tiancheng Hu , Yinhong Liu , Ahmet Üstün , Nigel Collier

Enabling Language Models to Implicitly Learn Self-Improvement

Large Language Models (LLMs) have demonstrated remarkable capabilities in open-ended text generation tasks. However, the inherent open-ended nature of these tasks implies that there is always room for improvement in the quality of model…

Computation and Language · Computer Science 2024-09-16 Ziqi Wang , Le Hou , Tianjian Lu , Yuexin Wu , Yunxuan Li , Hongkun Yu , Heng Ji

Personalized Adaptation via In-Context Preference Learning

Reinforcement Learning from Human Feedback (RLHF) is widely used to align Language Models (LMs) with human preferences. However, existing approaches often neglect individual user preferences, leading to suboptimal personalization. We…

Machine Learning · Computer Science 2024-10-21 Allison Lau , Younwoo Choi , Vahid Balazadeh , Keertana Chidambaram , Vasilis Syrgkanis , Rahul G. Krishnan

Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning foundation models to human values and preferences. However, current RLHF techniques cannot account for the naturally occurring differences in individual…

Machine Learning · Computer Science 2024-08-20 Sriyash Poddar , Yanming Wan , Hamish Ivison , Abhishek Gupta , Natasha Jaques

PAD: Personalized Alignment of LLMs at Decoding-Time

Aligning with personalized preferences, which vary significantly across cultural, educational, and political differences, poses a significant challenge due to the computational costs and data demands of traditional alignment methods. In…

Computation and Language · Computer Science 2025-03-14 Ruizhe Chen , Xiaotian Zhang , Meng Luo , Wenhao Chai , Zuozhu Liu

Fints: Efficient Inference-Time Personalization for LLMs with Fine-Grained Instance-Tailored Steering

The rapid evolution of large language models (LLMs) has intensified the demand for effective personalization techniques that can adapt model behavior to individual user preferences. Despite the non-parametric methods utilizing the…

Artificial Intelligence · Computer Science 2025-11-03 Kounianhua Du , Jianxing Liu , Kangning Zhang , Wenxiang Jiao , Yuan Lu , Jiarui Jin , Weiwen Liu , Yong Yu , Weinan Zhang

Aligning Large Language Models with Human Preferences through Representation Engineering

Aligning large language models (LLMs) with human preferences is crucial for enhancing their utility in terms of helpfulness, truthfulness, safety, harmlessness, and interestingness. Existing methods for achieving this alignment often…

Computation and Language · Computer Science 2024-07-04 Wenhao Liu , Xiaohua Wang , Muling Wu , Tianlong Li , Changze Lv , Zixuan Ling , Jianhao Zhu , Cenyuan Zhang , Xiaoqing Zheng , Xuanjing Huang

DeAL: Decoding-time Alignment for Large Language Models

Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF).…

Artificial Intelligence · Computer Science 2026-01-21 James Y. Huang , Sailik Sengupta , Daniele Bonadiman , Yi-An Lai , Arshit Gupta , Nikolaos Pappas , Saab Mansour , Katrin Kirchhoff , Dan Roth

Understanding the Learning Dynamics of Alignment with Human Feedback

Aligning large language models (LLMs) with human intentions has become a critical task for safely deploying models in real-world systems. While existing alignment approaches have seen empirical success, theoretically understanding how these…

Machine Learning · Computer Science 2024-08-08 Shawn Im , Yixuan Li

Align-Pro: A Principled Approach to Prompt Optimization for LLM Alignment

The alignment of large language models (LLMs) with human values is critical as these models become increasingly integrated into various societal and decision-making processes. Traditional methods, such as reinforcement learning from human…

Machine Learning · Computer Science 2025-01-08 Prashant Trivedi , Souradip Chakraborty , Avinash Reddy , Vaneet Aggarwal , Amrit Singh Bedi , George K. Atia