English
Related papers

Related papers: Learning from Reference Answers: Versatile Languag…

200 papers

Large language models (LLMs) are increasingly deployed via public-facing interfaces to interact with millions of users, each with diverse preferences. Despite this, preference tuning of LLMs predominantly relies on reward models trained…

Computation and Language · Computer Science 2024-12-06 Vishakh Padmakumar , Chuanyang Jin , Hannah Rose Kirk , He He

Aligning language models (LMs) based on human-annotated preference data is a crucial step in obtaining practical and performant LM-based systems. However, multilingual human preference data are difficult to obtain at scale, making it…

Computation and Language · Computer Science 2024-10-15 Zhaofeng Wu , Ananth Balashankar , Yoon Kim , Jacob Eisenstein , Ahmad Beirami

Reward Modeling is critical in evaluating and improving the generation of Large Language Models (LLMs). While numerous recent works have shown its feasibility in improving safety, helpfulness, reasoning, and instruction-following ability,…

Computation and Language · Computer Science 2025-11-13 Hanning Zhang , Juntong Song , Juno Zhu , Yuanhao Wu , Tong Zhang , Cheng Niu

In aligning large language models (LLMs), reward models have played an important role, but are standardly trained as discriminative models and rely only on labeled human preference data. In this paper, we explore methods that train reward…

Computation and Language · Computer Science 2026-01-27 Chenglong Wang , Yang Gan , Yifu Huo , Yongyu Mu , Qiaozhi He , Murun Yang , Bei Li , Tong Xiao , Chunliang Zhang , Tongran Liu , Jingbo Zhu

Preference alignment in Large Language Models (LLMs) has significantly improved their ability to adhere to human instructions and intentions. However, existing direct alignment algorithms primarily focus on relative preferences and often…

Machine Learning · Computer Science 2025-05-13 Shenao Zhang , Zhihan Liu , Boyi Liu , Yufeng Zhang , Yingxiang Yang , Yongfei Liu , Liyu Chen , Tao Sun , Zhaoran Wang

Despite the significant progress made by existing retrieval augmented language models (RALMs) in providing trustworthy responses and grounding in reliable sources, they often overlook effective alignment with human preferences. In the…

Computation and Language · Computer Science 2024-12-19 Zhuoran Jin , Hongbang Yuan , Tianyi Men , Pengfei Cao , Yubo Chen , Kang Liu , Jun Zhao

Reward models (RMs) are essential for aligning Large Language Models (LLMs) with human preferences. However, they often struggle with capturing complex human preferences and generalizing to unseen data. To address these challenges, we…

Computation and Language · Computer Science 2025-08-06 Anamika Lochab , Ruqi Zhang

Recent self-rewarding large language models (LLM) have successfully applied LLM-as-a-Judge to iteratively improve the alignment performance without the need of human annotations for preference data. These methods commonly utilize the same…

Machine Learning · Computer Science 2025-04-29 Zhaoyang Wang , Weilei He , Zhiyuan Liang , Xuchao Zhang , Chetan Bansal , Ying Wei , Weitong Zhang , Huaxiu Yao

Alignment of large language models (LLMs) typically involves training a reward model on preference data, followed by policy optimization with respect to the reward model. However, optimizing policies with respect to a single reward model…

Machine Learning · Computer Science 2025-07-23 Debangshu Banerjee , Kintan Saha , Aditya Gopalan

Reward learning enables the application of reinforcement learning (RL) to tasks where reward is defined by human judgment, building a model of reward by asking humans questions. Most work on reward learning has used simulated environments,…

Computation and Language · Computer Science 2020-01-10 Daniel M. Ziegler , Nisan Stiennon , Jeffrey Wu , Tom B. Brown , Alec Radford , Dario Amodei , Paul Christiano , Geoffrey Irving

Large Language Models (LLMs) exhibit impressive capabilities but require careful alignment with human preferences. Traditional training-time methods finetune LLMs using human preference datasets but incur significant training costs and…

Computation and Language · Computer Science 2025-07-16 Yuancheng Xu , Udari Madhushani Sehwag , Alec Koppel , Sicheng Zhu , Bang An , Furong Huang , Sumitra Ganesh

Reward Models, essential for guiding Large Language Model optimization, are typically trained on fixed preference datasets, resulting in rigid alignment to single, implicit preference distributions. This prevents adaptation to diverse…

Computation and Language · Computer Science 2025-07-08 Zhuohao Yu , Jiali Zeng , Weizheng Gu , Yidong Wang , Jindong Wang , Fandong Meng , Jie Zhou , Yue Zhang , Shikun Zhang , Wei Ye

Preference optimization, particularly through Reinforcement Learning from Human Feedback (RLHF), has achieved significant success in aligning Large Language Models (LLMs) to adhere to human intentions. Unlike offline alignment with a fixed…

Machine Learning · Computer Science 2024-11-06 Shenao Zhang , Donghan Yu , Hiteshi Sharma , Han Zhong , Zhihan Liu , Ziyi Yang , Shuohang Wang , Hany Hassan , Zhaoran Wang

Aligning the behavior of Large language models (LLMs) with human intentions and values remains a critical challenge. Reinforcement learning from human feedback (RLHF) aligns LLMs by training a reward model (RM) on human preferences and…

Computation and Language · Computer Science 2025-12-25 Jiayi Zhou , Jiaming Ji , Juntao Dai , Dong Li , Yaodong Yang

We study estimation and statistical inference for reward models used in aligning large language models (LLMs). A key component of LLM alignment is reinforcement learning from human feedback (RLHF), where humans compare pairs of…

Machine Learning · Statistics 2025-12-04 Pangpang Liu , Junwei Lu , Will Wei Sun

Large Language Models (LLMs) have made substantial strides in structured tasks through Reinforcement Learning (RL), demonstrating proficiency in mathematical reasoning and code generation. However, applying RL in broader domains like…

Computation and Language · Computer Science 2025-02-10 Hao Sun , Yunyi Shen , Jean-Francois Ton , Mihaela van der Schaar

Recent advances in large language models (LLMs) have demonstrated significant progress in performing complex tasks. While Reinforcement Learning from Human Feedback (RLHF) has been effective in aligning LLMs with human preferences, it is…

Machine Learning · Computer Science 2025-05-30 Chaoqi Wang , Zhuokai Zhao , Yibo Jiang , Zhaorun Chen , Chen Zhu , Yuxin Chen , Jiayi Liu , Lizhu Zhang , Xiangjun Fan , Hao Ma , Sinong Wang

Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora, making them powerful tools for various applications. To make LLMs more usable, aligning them with human preferences is essential.…

Computation and Language · Computer Science 2024-10-21 Mozhi Zhang , Pengyu Wang , Chenkun Tan , Mianqiu Huang , Dong Zhang , Yaqian Zhou , Xipeng Qiu

Reinforcement learning (RL) can align language models with non-differentiable reward signals, such as human preferences. However, a major challenge arises from the sparsity of these reward signals - typically, there is only a single reward…

Computation and Language · Computer Science 2024-02-20 Meng Cao , Lei Shu , Lei Yu , Yun Zhu , Nevan Wichers , Yinxiao Liu , Lei Meng

Deep Reinforcement Learning is widely used for aligning Large Language Models (LLM) with human preference. However, the conventional reward modelling is predominantly dependent on human annotations provided by a select cohort of…

Artificial Intelligence · Computer Science 2024-05-31 Dexun Li , Cong Zhang , Kuicai Dong , Derrick Goh Xin Deik , Ruiming Tang , Yong Liu
‹ Prev 1 2 3 10 Next ›