English
Related papers

Related papers: DeAL: Decoding-time Alignment for Large Language M…

200 papers

Aligning language models with human preferences is crucial for reducing errors and biases in these models. Alignment techniques, such as reinforcement learning from human feedback (RLHF), are typically cast as optimizing a tradeoff between…

We introduce ALaRM, the first framework modeling hierarchical rewards in reinforcement learning from human feedback (RLHF), which is designed to enhance the alignment of large language models (LLMs) with human preferences. The framework…

Computation and Language · Computer Science 2024-03-19 Yuhang Lai , Siyuan Wang , Shujun Liu , Xuanjing Huang , Zhongyu Wei

Large Language Models (LLMs) demonstrate transformative potential, yet their reasoning remains inconsistent and unreliable. Reinforcement learning (RL)-based fine-tuning is a key mechanism for improvement, but its effectiveness is…

Machine Learning · Computer Science 2026-02-11 Pei-Chi Pan , Yingbin Liang , Sen Lin

State-of-the-art large language models (LLMs) have become indispensable tools for various tasks. However, training LLMs to serve as effective assistants for humans requires careful consideration. A promising approach is reinforcement…

Reinforcement Learning (RL) has emerged as a transformative approach for aligning and enhancing Large Language Models (LLMs), addressing critical challenges in instruction following, ethical alignment, and reasoning capabilities. This…

Artificial Intelligence · Computer Science 2025-07-08 Saksham Sahai Srivastava , Vaneet Aggarwal

Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference…

Machine Learning · Computer Science 2024-06-25 Mucong Ding , Souradip Chakraborty , Vibhu Agrawal , Zora Che , Alec Koppel , Mengdi Wang , Amrit Bedi , Furong Huang

Aligning with personalized preferences, which vary significantly across cultural, educational, and political differences, poses a significant challenge due to the computational costs and data demands of traditional alignment methods. In…

Computation and Language · Computer Science 2025-03-14 Ruizhe Chen , Xiaotian Zhang , Meng Luo , Wenhao Chai , Zuozhu Liu

Recent advances in large language models (LLMs) have demonstrated significant progress in performing complex tasks. While Reinforcement Learning from Human Feedback (RLHF) has been effective in aligning LLMs with human preferences, it is…

Machine Learning · Computer Science 2025-05-30 Chaoqi Wang , Zhuokai Zhao , Yibo Jiang , Zhaorun Chen , Chen Zhu , Yuxin Chen , Jiayi Liu , Lizhu Zhang , Xiangjun Fan , Hao Ma , Sinong Wang

The alignment of large language models (LLMs) aims to ensure their outputs adhere to human values, ethical standards, and legal norms. Traditional alignment methods often rely on resource-intensive fine-tuning (FT), which may suffer from…

Computation and Language · Computer Science 2025-09-11 Birong Pan , Yongqi Li , Weiyu Zhang , Wenpeng Lu , Mayi Xu , Shen Zhou , Yuanyuan Zhu , Ming Zhong , Tieyun Qian

Aligning large language models with human objectives is paramount, yet common approaches including RLHF suffer from unstable and resource-intensive training. In response to this challenge, we introduce ARGS, Alignment as Reward-Guided…

Computation and Language · Computer Science 2024-02-06 Maxim Khanov , Jirayu Burapacheep , Yixuan Li

Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora, making them powerful tools for various applications. To make LLMs more usable, aligning them with human preferences is essential.…

Computation and Language · Computer Science 2024-10-21 Mozhi Zhang , Pengyu Wang , Chenkun Tan , Mianqiu Huang , Dong Zhang , Yaqian Zhou , Xipeng Qiu

AI Alignment, primarily in the form of Reinforcement Learning from Human Feedback (RLHF), has been a cornerstone of the post-training phase in developing Large Language Models (LLMs). It has also been a popular research topic across various…

Computation and Language · Computer Science 2025-08-26 Ilias Chalkidis

Reinforcement Learning from Human Feedback (RLHF) has been credited as the key advance that has allowed Large Language Models (LLMs) to effectively follow instructions and produce useful assistance. Classically, this involves generating…

Machine Learning · Computer Science 2024-02-02 Alex J. Chan , Hao Sun , Samuel Holt , Mihaela van der Schaar

Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) more capable in complex settings. RLHF proceeds as collecting human preference data, training a reward model on said…

Machine Learning · Computer Science 2024-02-05 Nathan Lambert , Roberto Calandra

Alignment of Large Language models (LLMs) is crucial for safe and trustworthy deployment in applications. Reinforcement learning from human feedback (RLHF) has emerged as an effective technique to align LLMs to human preferences and broader…

Large Language Models (LLMs) have made substantial strides in structured tasks through Reinforcement Learning (RL), demonstrating proficiency in mathematical reasoning and code generation. However, applying RL in broader domains like…

Computation and Language · Computer Science 2025-02-10 Hao Sun , Yunyi Shen , Jean-Francois Ton , Mihaela van der Schaar

Aligning language models (LMs) to human preferences has emerged as a critical pursuit, enabling these models to better serve diverse user needs. Existing methods primarily focus on optimizing LMs for a single reward function, limiting their…

Machine Learning · Computer Science 2024-10-29 Ruizhe Shi , Yifang Chen , Yushi Hu , Alisa Liu , Hannaneh Hajishirzi , Noah A. Smith , Simon S. Du

Reinforcement Learning from Human Feedback (RLHF), using algorithms like Proximal Policy Optimization (PPO), aligns Large Language Models (LLMs) with human values but is costly and unstable. Alternatives have been proposed to replace PPO or…

Computation and Language · Computer Science 2026-04-03 Liang Zhu , Feiteng Fang , Yuelin Bai , Longze Chen , Zhexiang Zhang , Minghuan Tan , Min Yang

Large Language Models (LLMs) have recently developed new advanced functionalities. Their effectiveness relies on statistical learning and generalization capabilities. However, they face limitations in internalizing the data they process and…

Machine Learning · Computer Science 2026-01-14 Farah Ben Slama , Frédéric Armetta

Reward design plays a pivotal role in aligning large language models (LLMs) with human values, serving as the bridge between feedback signals and model optimization. This survey provides a structured organization of reward modeling and…

Computation and Language · Computer Science 2025-09-03 Miaomiao Ji , Yanqiu Wu , Zhibin Wu , Shoujin Wang , Jian Yang , Mark Dras , Usman Naseem
‹ Prev 1 2 3 10 Next ›