English
Related papers

Related papers: Adaptive Preference Aggregation

200 papers

Conventional preference learning methods often prioritize opinions held more widely when aggregating preferences from multiple evaluators. This may result in policies that are biased in favor of some types of opinions or groups and…

Artificial Intelligence · Computer Science 2026-03-03 Kihyun Kim , Jiawei Zhang , Asuman Ozdaglar , Pablo A. Parrilo

Reinforcement learning from human feedback (RLHF) has been an effective technique for aligning AI systems with human values, with remarkable successes in fine-tuning large-language models recently. Most existing RLHF paradigms make the…

Artificial Intelligence · Computer Science 2024-05-28 Chanwoo Park , Mingyang Liu , Dingwen Kong , Kaiqing Zhang , Asuman Ozdaglar

In the context of reinforcement learning from human feedback (RLHF), the reward function is generally derived from maximum likelihood estimation of a random utility model based on pairwise comparisons made by humans. The problem of learning…

Computer Science and Game Theory · Computer Science 2024-11-08 Luise Ge , Daniel Halpern , Evi Micha , Ariel D. Procaccia , Itai Shapira , Yevgeniy Vorobeychik , Junlin Wu

Despite its empirical success, Reinforcement Learning from Human Feedback (RLHF) has been shown to violate almost all the fundamental axioms in social choice theory -- such as majority consistency, pairwise majority consistency, and…

Machine Learning · Statistics 2025-06-17 Jiancong Xiao , Zhekun Shi , Kaizhao Liu , Qi Long , Weijie J. Su

Aligning AI agents to human intentions and values is a key bottleneck in building safe and deployable AI applications. But whose values should AI agents be aligned with? Reinforcement learning with human feedback (RLHF) has emerged as the…

Artificial Intelligence · Computer Science 2023-10-25 Abhilash Mishra

This paper addresses the challenge of aligning large language models (LLMs) with diverse human preferences within federated learning (FL) environments, where standard methods often fail to adequately represent diverse viewpoints. We…

Computation and Language · Computer Science 2025-12-17 Mahmoud Srewa , Tianyu Zhao , Salma Elmalaki

Aligning human preference and value is an important requirement for building contemporary foundation models and embodied AI. However, popular approaches such as reinforcement learning with human feedback (RLHF) break down the task into…

Artificial Intelligence · Computer Science 2024-12-03 Chenliang Li , Siliang Zeng , Zeyi Liao , Jiaxiang Li , Dongyeop Kang , Alfredo Garcia , Mingyi Hong

Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values by learning rewards from human preference data. Due to various reasons, however, such data typically takes the form of rankings…

Machine Learning · Computer Science 2024-06-06 Ilgee Hong , Zichong Li , Alexander Bukharin , Yixiao Li , Haoming Jiang , Tianbao Yang , Tuo Zhao

Reinforcement Learning from Human Feedback (RLHF), the standard for aligning Large Language Models (LLMs) with human values, is known to fail to satisfy properties that are intuitively desirable, such as respecting the preferences of the…

Artificial Intelligence · Computer Science 2025-02-03 Roberto-Rafael Maura-Rivero , Marc Lanctot , Francesco Visin , Kate Larson

We consider the challenge of AI value alignment with multiple individuals that have different reward functions and optimal policies in an underlying Markov decision process. We formalize this problem as one of policy aggregation, where the…

Artificial Intelligence · Computer Science 2024-11-07 Parand A. Alamdari , Soroush Ebadian , Ariel D. Procaccia

The success of AI assistants based on language models (LLMs) hinges crucially on Reinforcement Learning from Human Feedback (RLHF), which enables the generation of responses more aligned with human preferences. As universal AI assistants,…

Machine Learning · Computer Science 2023-12-27 Rui Zheng , Wei Shen , Yuan Hua , Wenbin Lai , Shihan Dou , Yuhao Zhou , Zhiheng Xi , Xiao Wang , Haoran Huang , Tao Gui , Qi Zhang , Xuanjing Huang

Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data. However, such an approach overlooks the rich diversity of human preferences…

Computation and Language · Computer Science 2024-12-30 Souradip Chakraborty , Jiahao Qiu , Hui Yuan , Alec Koppel , Furong Huang , Dinesh Manocha , Amrit Singh Bedi , Mengdi Wang

A key challenge in training Large Language Models (LLMs) is properly aligning them with human preferences. Reinforcement Learning with Human Feedback (RLHF) uses pairwise comparisons from human annotators to train reward functions and has…

Machine Learning · Computer Science 2025-01-17 Ariel D. Procaccia , Benjamin Schiffer , Shirley Zhang

One of the challenges of aligning large models with human preferences lies in both the data requirements and the technical complexities of current approaches. Predominant methods, such as RLHF, involve multiple steps, each demanding…

Machine Learning · Computer Science 2025-03-19 Siliang Zeng , Yao Liu , Huzefa Rangwala , George Karypis , Mingyi Hong , Rasool Fakoor

Aligning large language models (LLMs) with diverse human preferences requires pluralistic alignment, where a single model must respect the values of multiple distinct groups simultaneously. In federated reinforcement learning from human…

Machine Learning · Computer Science 2026-04-07 Mahmoud Srewa , Tianyu Zhao , Salma Elmalaki

Reinforcement Learning from Human Feedback (RLHF) relies on preference modeling to align machine learning systems with human values, yet the popular approach of random pair sampling with Bradley-Terry modeling is statistically limited and…

Human-Computer Interaction · Computer Science 2025-12-02 Andreas Chouliaras , Dimitris Chatzopoulos

Reinforcement Learning with Human Feedback (RLHF) is a widely used fine-tuning approach that aligns machine learning model, particularly Language Model (LM) with human preferences. There are typically multiple objectives driving the…

Machine Learning · Computer Science 2025-02-25 Nuoya Xiong , Aarti Singh

Large language models (LLMs) often contain misleading content, emphasizing the need to align them with human values to ensure secure AI systems. Reinforcement learning from human feedback (RLHF) has been employed to achieve this alignment.…

Computation and Language · Computer Science 2024-02-28 Feifan Song , Bowen Yu , Minghao Li , Haiyang Yu , Fei Huang , Yongbin Li , Houfeng Wang

Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm in artificial intelligence to align large models with human preferences. In this paper, we propose a novel statistical framework to simultaneously conduct the…

Machine Learning · Statistics 2026-05-01 Nan Lu , Ethan Lee , Ethan X. Fang , Junwei Lu

The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by…

Computation and Language · Computer Science 2024-07-03 Songyang Gao , Qiming Ge , Wei Shen , Shihan Dou , Junjie Ye , Xiao Wang , Rui Zheng , Yicheng Zou , Zhi Chen , Hang Yan , Qi Zhang , Dahua Lin
‹ Prev 1 2 3 10 Next ›