English
Related papers

Related papers: Drift: Decoding-time Personalized Alignments with …

200 papers

The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by…

Computation and Language · Computer Science 2024-07-03 Songyang Gao , Qiming Ge , Wei Shen , Shihan Dou , Junjie Ye , Xiao Wang , Rui Zheng , Yicheng Zou , Zhi Chen , Hang Yan , Qi Zhang , Dahua Lin

Reinforcement Learning from Human Feedback (RLHF), using algorithms like Proximal Policy Optimization (PPO), aligns Large Language Models (LLMs) with human values but is costly and unstable. Alternatives have been proposed to replace PPO or…

Computation and Language · Computer Science 2026-04-03 Liang Zhu , Feiteng Fang , Yuelin Bai , Longze Chen , Zhexiang Zhang , Minghuan Tan , Min Yang

Fine-tuning large language models (LLMs) to align with user preferences is challenging due to the high cost of quality human annotations in Reinforcement Learning from Human Feedback (RLHF) and the generalizability limitations of AI…

Modern large language models (LLMs) are optimized for human-aligned responses using Reinforcement Learning from Human Feedback (RLHF). However, existing RLHF approaches assume a universal preference model and fail to account for individual…

Machine Learning · Computer Science 2025-03-11 Idan Shenfeld , Felix Faltings , Pulkit Agrawal , Aldo Pacchiano

Personalizing large language models (LLMs) to accommodate diverse user preferences is essential for enhancing alignment and user satisfaction. Traditional reinforcement learning from human feedback (RLHF) approaches often rely on monolithic…

Machine Learning · Computer Science 2025-04-22 Avinandan Bose , Zhihan Xiong , Yuejie Chi , Simon Shaolei Du , Lin Xiao , Maryam Fazel

Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora, making them powerful tools for various applications. To make LLMs more usable, aligning them with human preferences is essential.…

Computation and Language · Computer Science 2024-10-21 Mozhi Zhang , Pengyu Wang , Chenkun Tan , Mianqiu Huang , Dong Zhang , Yaqian Zhou , Xipeng Qiu

Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference…

Machine Learning · Computer Science 2024-06-25 Mucong Ding , Souradip Chakraborty , Vibhu Agrawal , Zora Che , Alec Koppel , Mengdi Wang , Amrit Bedi , Furong Huang

Personalized large language models (LLMs) are designed to tailor responses to individual user preferences. While Reinforcement Learning from Human Feedback (RLHF) is a commonly used framework for aligning LLMs with human preferences,…

Computation and Language · Computer Science 2024-12-10 Xinyu Li , Ruiyang Zhou , Zachary C. Lipton , Liu Leqi

This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF). Given stated preferences…

Artificial Intelligence · Computer Science 2024-07-08 Jin Peng Zhou , Katie Z Luo , Jingwen Gu , Jason Yuan , Kilian Q. Weinberger , Wen Sun

Model alignment with human preferences is an essential step in making Large Language Models (LLMs) helpful and consistent with human values. It typically consists of supervised fine-tuning (SFT) and reinforcement learning from human…

Computation and Language · Computer Science 2023-10-10 Yi Dong , Zhilin Wang , Makesh Narsimhan Sreedhar , Xianchao Wu , Oleksii Kuchaiev

While Reinforcement Learning from Human Feedback (RLHF) is widely used to align Large Language Models (LLMs) with human preferences, it typically assumes homogeneous preferences across users, overlooking diverse human values and minority…

Computation and Language · Computer Science 2025-10-28 Yijiang River Dong , Tiancheng Hu , Yinhong Liu , Ahmet Üstün , Nigel Collier

Large Language Models (LLMs) have demonstrated remarkable capabilities in open-ended text generation tasks. However, the inherent open-ended nature of these tasks implies that there is always room for improvement in the quality of model…

Computation and Language · Computer Science 2024-09-16 Ziqi Wang , Le Hou , Tianjian Lu , Yuexin Wu , Yunxuan Li , Hongkun Yu , Heng Ji

Reinforcement Learning from Human Feedback (RLHF) is widely used to align Language Models (LMs) with human preferences. However, existing approaches often neglect individual user preferences, leading to suboptimal personalization. We…

Machine Learning · Computer Science 2024-10-21 Allison Lau , Younwoo Choi , Vahid Balazadeh , Keertana Chidambaram , Vasilis Syrgkanis , Rahul G. Krishnan

Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning foundation models to human values and preferences. However, current RLHF techniques cannot account for the naturally occurring differences in individual…

Machine Learning · Computer Science 2024-08-20 Sriyash Poddar , Yanming Wan , Hamish Ivison , Abhishek Gupta , Natasha Jaques

Aligning with personalized preferences, which vary significantly across cultural, educational, and political differences, poses a significant challenge due to the computational costs and data demands of traditional alignment methods. In…

Computation and Language · Computer Science 2025-03-14 Ruizhe Chen , Xiaotian Zhang , Meng Luo , Wenhao Chai , Zuozhu Liu

The rapid evolution of large language models (LLMs) has intensified the demand for effective personalization techniques that can adapt model behavior to individual user preferences. Despite the non-parametric methods utilizing the…

Artificial Intelligence · Computer Science 2025-11-03 Kounianhua Du , Jianxing Liu , Kangning Zhang , Wenxiang Jiao , Yuan Lu , Jiarui Jin , Weiwen Liu , Yong Yu , Weinan Zhang

Aligning large language models (LLMs) with human preferences is crucial for enhancing their utility in terms of helpfulness, truthfulness, safety, harmlessness, and interestingness. Existing methods for achieving this alignment often…

Computation and Language · Computer Science 2024-07-04 Wenhao Liu , Xiaohua Wang , Muling Wu , Tianlong Li , Changze Lv , Zixuan Ling , Jianhao Zhu , Cenyuan Zhang , Xiaoqing Zheng , Xuanjing Huang

Large Language Models (LLMs) are nowadays expected to generate content aligned with human preferences. Current work focuses on alignment at model training time, through techniques such as Reinforcement Learning with Human Feedback (RLHF).…

Artificial Intelligence · Computer Science 2026-01-21 James Y. Huang , Sailik Sengupta , Daniele Bonadiman , Yi-An Lai , Arshit Gupta , Nikolaos Pappas , Saab Mansour , Katrin Kirchhoff , Dan Roth

Aligning large language models (LLMs) with human intentions has become a critical task for safely deploying models in real-world systems. While existing alignment approaches have seen empirical success, theoretically understanding how these…

Machine Learning · Computer Science 2024-08-08 Shawn Im , Yixuan Li

The alignment of large language models (LLMs) with human values is critical as these models become increasingly integrated into various societal and decision-making processes. Traditional methods, such as reinforcement learning from human…

Machine Learning · Computer Science 2025-01-08 Prashant Trivedi , Souradip Chakraborty , Avinash Reddy , Vaneet Aggarwal , Amrit Singh Bedi , George K. Atia
‹ Prev 1 2 3 10 Next ›