English
Related papers

Related papers: Improving Context-Aware Preference Modeling for La…

200 papers

Preference alignment in Large Language Models (LLMs) has significantly improved their ability to adhere to human instructions and intentions. However, existing direct alignment algorithms primarily focus on relative preferences and often…

Machine Learning · Computer Science 2025-05-13 Shenao Zhang , Zhihan Liu , Boyi Liu , Yufeng Zhang , Yingxiang Yang , Yongfei Liu , Liyu Chen , Tao Sun , Zhaoran Wang

Learning human preferences in language models remains fundamentally challenging, as reward modeling relies on subtle, subjective comparisons or shades of gray rather than clear-cut labels. This study investigates the limits of current…

Computation and Language · Computer Science 2026-04-03 Simona-Vasilica Oprea , Adela Bâra

Preference-based reinforcement learning is an effective way to handle tasks where rewards are hard to specify but can be exceedingly inefficient as preference learning is often tabula rasa. We demonstrate that Large Language Models (LLMs)…

Artificial Intelligence · Computer Science 2025-04-04 Chao Yu , Qixin Tan , Hong Lu , Jiaxuan Gao , Xinting Yang , Yu Wang , Yi Wu , Eugene Vinitsky

Preference tuning is a crucial process for aligning deep generative models with human preferences. This survey offers a thorough overview of recent advancements in preference tuning and the integration of human feedback. The paper is…

Computation and Language · Computer Science 2024-11-05 Genta Indra Winata , Hanyang Zhao , Anirban Das , Wenpin Tang , David D. Yao , Shi-Xiong Zhang , Sambit Sahu

Aligning language models with human preferences through reinforcement learning from human feedback is crucial for their safe and effective deployment. The human preference is typically represented through comparison where one response is…

Machine Learning · Computer Science 2025-07-15 Hoang Anh Just , Ming Jin , Anit Sahu , Huy Phan , Ruoxi Jia

Reinforcement learning from human feedback usually models preferences using a reward function that does not distinguish between people. We argue that this is unlikely to be a good design choice in contexts with high potential for…

Language models are known to encode a great amount of factual knowledge through pretraining. However, such knowledge might be insufficient to cater to user requests, requiring the model to integrate external knowledge sources and adhere to…

Computation and Language · Computer Science 2024-07-19 Sizhe Zhou , Sha Li , Yu Meng , Yizhu Jiao , Heng Ji , Jiawei Han

Humans use social context to specify preferences over behaviors, i.e. their reward functions. Yet, algorithms for inferring reward models from preference data do not take this social learning view into account. Inspired by pragmatic human…

Machine Learning · Computer Science 2024-05-24 Andi Peng , Yuying Sun , Tianmin Shu , David Abel

Reward modelling from preference data is a crucial step in aligning large language models (LLMs) with human values, requiring robust generalisation to novel prompt-response pairs. In this work, we propose to frame this problem in a causal…

Artificial Intelligence · Computer Science 2026-05-12 Katarzyna Kobalczyk , Mihaela van der Schaar

Reinforcement Learning from Human Feedback (RLHF) has become a crucial technology for aligning language models with human values and intentions, enabling models to produce more helpful and harmless responses. Reward models are trained as…

Pretrained language models often do not perform tasks in ways that are in line with our preferences, e.g., generating offensive text or factually incorrect summaries. Recent work approaches the above issue by learning from a simple form of…

Computation and Language · Computer Science 2022-11-18 Jérémy Scheurer , Jon Ander Campos , Jun Shern Chan , Angelica Chen , Kyunghyun Cho , Ethan Perez

Large language models (LLMs) have achieved remarkable success, yet aligning their generations with human preferences remains a critical challenge. Existing approaches to preference modeling often rely on an explicit or implicit reward…

Computation and Language · Computer Science 2025-05-09 Zhuocheng Gong , Jian Guan , Wei Wu , Huishuai Zhang , Dongyan Zhao

Language model users often issue queries that lack specification, where the context under which a query was issued -- such as the user's identity, the query's intent, and the criteria for a response to be useful -- is not explicit. For…

Computation and Language · Computer Science 2025-05-27 Chaitanya Malaviya , Joseph Chee Chang , Dan Roth , Mohit Iyyer , Mark Yatskar , Kyle Lo

Reward models (RMs) are essential for aligning large language models (LLMs) with human preferences to improve interaction quality. However, the real world is pluralistic, which leads to diversified human preferences with respect to…

Computation and Language · Computer Science 2023-09-18 Pengyu Cheng , Jiawen Xie , Ke Bai , Yong Dai , Nan Du

Learning from preference feedback has emerged as an essential step for improving the generation quality and performance of modern language models (LMs). Despite its widespread use, the way preference-based learning is applied varies wildly,…

Computation and Language · Computer Science 2024-10-10 Hamish Ivison , Yizhong Wang , Jiacheng Liu , Zeqiu Wu , Valentina Pyatkin , Nathan Lambert , Noah A. Smith , Yejin Choi , Hannaneh Hajishirzi

Context-aware machine translation (MT) leverages document-level information, yet it does not consistently outperform sentence-level MT, as contextual signals are unevenly beneficial across sentences. Existing training objectives do not…

Computation and Language · Computer Science 2026-03-27 Ying Li , Xinglin Lyu , Junhui Li , Jinlong Yang , Hengchao Shang , Min Zhang , Shimin Tao , Daimeng Wei

Reward modeling has emerged as a crucial component in aligning large language models with human values. Significant attention has focused on using reward models as a means for fine-tuning generative models. However, the reward models…

Computation and Language · Computer Science 2026-02-04 Brian Christian , Hannah Rose Kirk , Jessica A. F. Thompson , Christopher Summerfield , Tsvetomira Dumbalska

A key challenge in reward learning from human input is that desired agent behavior often changes based on context. For example, a robot must adapt to avoid a stove once it becomes hot. We observe that while high-level preferences (e.g.,…

Robotics · Computer Science 2026-01-14 Alexandra Forsey-Smerek , Julie Shah , Andreea Bobu

Aligning language models (LMs) with preferences is an important problem in natural language generation. A key challenge is that preferences are typically provided at the sequence level while LM training and generation both occur at the…

Computation and Language · Computer Science 2025-01-09 Shentao Yang , Shujian Zhang , Congying Xia , Yihao Feng , Caiming Xiong , Mingyuan Zhou

Direct Preference Optimization (DPO) has become a prominent method for aligning Large Language Models (LLMs) with human preferences. While DPO has enabled significant progress in aligning English LLMs, multilingual preference alignment is…

Computation and Language · Computer Science 2025-06-06 Wen Yang , Junhong Wu , Chen Wang , Chengqing Zong , Jiajun Zhang
‹ Prev 1 2 3 10 Next ›