Related papers: One Model for All: Multi-Objective Controllable La…

Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization

Multi-objective alignment from human feedback (MOAHF) in large language models (LLMs) is a challenging problem as human preferences are complex, multifaceted, and often conflicting. Recent works on MOAHF considered a-priori multi-objective…

Machine Learning · Computer Science 2024-12-10 Subhojyoti Mukherjee , Anusha Lalitha , Sailik Sengupta , Aniket Deshmukh , Branislav Kveton

Multi-Objective Preference Optimization: Improving Human Alignment of Generative Models

Post-training of LLMs with RLHF, and subsequently preference optimization algorithms such as DPO, IPO, etc., made a big difference in improving human alignment. However, all such techniques can only work with a single (human) objective. In…

Machine Learning · Computer Science 2025-05-19 Akhil Agnihotri , Rahul Jain , Deepak Ramachandran , Zheng Wen

MPO: An Efficient Post-Processing Framework for Mixing Diverse Preference Alignment

Reinforcement Learning from Human Feedback (RLHF) has shown promise in aligning large language models (LLMs). Yet its reliance on a singular reward model often overlooks the diversity of human preferences. Recent approaches address this…

Computation and Language · Computer Science 2025-07-23 Tianze Wang , Dongnan Gui , Yifan Hu , Shuhang Lin , Linjun Zhang

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

A single language model, even when aligned with labelers through reinforcement learning from human feedback (RLHF), may not suit all human preferences. Recent approaches therefore prefer customization, gathering multi-dimensional feedback,…

Machine Learning · Computer Science 2024-08-20 Zhanhui Zhou , Jie Liu , Jing Shao , Xiangyu Yue , Chao Yang , Wanli Ouyang , Yu Qiao

Orchestrating LLMs with Different Personalizations

This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF). Given stated preferences…

Artificial Intelligence · Computer Science 2024-07-08 Jin Peng Zhou , Katie Z Luo , Jingwen Gu , Jason Yuan , Kilian Q. Weinberger , Wen Sun

UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality

Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone for aligning large language models (LLMs) with human values. However, existing approaches struggle to capture the multi-dimensional, distributional nuances of human…

Computation and Language · Computer Science 2025-05-20 Zelei Cheng , Xin-Qiang Cai , Yuting Tang , Pushi Zhang , Boming Yang , Masashi Sugiyama , Xinyu Xing

MATO: Multi-objective Personalized Alignment with Test-time Optimization for Large Language Models

Aligning large language models (LLMs) with diverse and multifaceted user preferences is a fundamental challenge in personalized AI systems. Existing multi-objective alignment methods either rely on costly training or require pre-trained…

Computation and Language · Computer Science 2026-05-26 Linhao Luo , Thuy-Trang Vu , Van-Anh Nguyen , Junae Kim , Gholamreza Haffari , Dinh Phung

Parameter-Efficient Tuning Helps Language Model Alignment

Aligning large language models (LLMs) with human preferences is essential for safe and useful LLMs. Previous works mainly adopt reinforcement learning (RLHF) and direct preference optimization (DPO) with human feedback for alignment.…

Computation and Language · Computer Science 2023-10-03 Tianci Xue , Ziqi Wang , Heng Ji

Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging

While Reinforcement Learning from Human Feedback (RLHF) aligns Large Language Models (LLMs) with general, aggregate human preferences, it is suboptimal for learning diverse, individual perspectives. In this work, we study Reinforcement…

Computation and Language · Computer Science 2023-10-19 Joel Jang , Seungone Kim , Bill Yuchen Lin , Yizhong Wang , Jack Hessel , Luke Zettlemoyer , Hannaneh Hajishirzi , Yejin Choi , Prithviraj Ammanabrolu

LoRe: Personalizing LLMs via Low-Rank Reward Modeling

Personalizing large language models (LLMs) to accommodate diverse user preferences is essential for enhancing alignment and user satisfaction. Traditional reinforcement learning from human feedback (RLHF) approaches often rely on monolithic…

Machine Learning · Computer Science 2025-04-22 Avinandan Bose , Zhihan Xiong , Yuejie Chi , Simon Shaolei Du , Lin Xiao , Maryam Fazel

Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier

For aligning large language models (LLMs), prior work has leveraged reinforcement learning via human feedback (RLHF) or variations of direct preference optimization (DPO). While DPO offers a simpler framework based on maximum likelihood…

Artificial Intelligence · Computer Science 2025-05-27 Anirudhan Badrinath , Prabhat Agarwal , Jiajing Xu

Projection Optimization: A General Framework for Multi-Objective and Multi-Group RLHF

Reinforcement Learning with Human Feedback (RLHF) is a widely used fine-tuning approach that aligns machine learning model, particularly Language Model (LM) with human preferences. There are typically multiple objectives driving the…

Machine Learning · Computer Science 2025-02-25 Nuoya Xiong , Aarti Singh

When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning

While Reinforcement Learning from Human Feedback (RLHF) is widely used to align Large Language Models (LLMs) with human preferences, it typically assumes homogeneous preferences across users, overlooking diverse human values and minority…

Computation and Language · Computer Science 2025-10-28 Yijiang River Dong , Tiancheng Hu , Yinhong Liu , Ahmet Üstün , Nigel Collier

Gradient-Adaptive Policy Optimization: Towards Multi-Objective Alignment of Large Language Models

Reinforcement Learning from Human Feedback (RLHF) has emerged as a powerful technique for aligning large language models (LLMs) with human preferences. However, effectively aligning LLMs with diverse human preferences remains a significant…

Computation and Language · Computer Science 2025-07-03 Chengao Li , Hanyu Zhang , Yunkun Xu , Hongyan Xue , Xiang Ao , Qing He

One Adapts to Any: Meta Reward Modeling for Personalized LLM Alignment

Alignment of Large Language Models (LLMs) aims to align outputs with human preferences, and personalized alignment further adapts models to individual users. This relies on personalized reward models that capture user-specific preferences…

Computation and Language · Computer Science 2026-04-21 Hongru Cai , Yongqi Li , Tiezheng Yu , Fengbin Zhu , Wenjie Wang , Fuli Feng , Wenjie Li

PMoL: Parameter Efficient MoE for Preference Mixing of LLM Alignment

Reinforcement Learning from Human Feedback (RLHF) has been proven to be an effective method for preference alignment of large language models (LLMs) and is widely used in the post-training process of LLMs. However, RLHF struggles with…

Computation and Language · Computer Science 2024-11-05 Dongxu Liu , Bing Xu , Yinzhuo Chen , Bufan Xu , Wenpeng Lu , Muyun Yang , Tiejun Zhao

Pareto Multi-Objective Alignment for Language Models

Large language models (LLMs) are increasingly deployed in real-world applications that require careful balancing of multiple, often conflicting, objectives, such as informativeness versus conciseness, or helpfulness versus creativity.…

Machine Learning · Computer Science 2025-08-12 Qiang He , Setareh Maghsudi

Language Model Personalization via Reward Factorization

Modern large language models (LLMs) are optimized for human-aligned responses using Reinforcement Learning from Human Feedback (RLHF). However, existing RLHF approaches assume a universal preference model and fail to account for individual…

Machine Learning · Computer Science 2025-03-11 Idan Shenfeld , Felix Faltings , Pulkit Agrawal , Aldo Pacchiano

Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts

Reinforcement learning from human feedback (RLHF) has emerged as the primary method for aligning large language models (LLMs) with human preferences. The RLHF process typically starts by training a reward model (RM) using human preference…

Machine Learning · Computer Science 2024-06-19 Haoxiang Wang , Wei Xiong , Tengyang Xie , Han Zhao , Tong Zhang

PCHC: Enabling Preference Conditioned Humanoid Control via Multi-Objective Reinforcement Learning

Humanoid robots often need to balance competing objectives, such as maximizing speed while minimizing energy consumption. While current reinforcement learning (RL) methods can master complex skills like fall recovery and perceptive…

Robotics · Computer Science 2026-03-26 Huanyu Li , Dewei Wang , Xinmiao Wang , Xinzhe Liu , Peng Liu , Chenjia Bai , Xuelong Li