English
Related papers

Related papers: Preference Learning for AI Alignment: a Causal Per…

200 papers

Recent advances in large language models (LLMs) have demonstrated significant progress in performing complex tasks. While Reinforcement Learning from Human Feedback (RLHF) has been effective in aligning LLMs with human preferences, it is…

Machine Learning · Computer Science 2025-05-30 Chaoqi Wang , Zhuokai Zhao , Yibo Jiang , Zhaorun Chen , Chen Zhu , Yuxin Chen , Jiayi Liu , Lizhu Zhang , Xiangjun Fan , Hao Ma , Sinong Wang

Preference alignment in Large Language Models (LLMs) has significantly improved their ability to adhere to human instructions and intentions. However, existing direct alignment algorithms primarily focus on relative preferences and often…

Machine Learning · Computer Science 2025-05-13 Shenao Zhang , Zhihan Liu , Boyi Liu , Yufeng Zhang , Yingxiang Yang , Yongfei Liu , Liyu Chen , Tao Sun , Zhaoran Wang

Recent breakthroughs in artificial intelligence have driven a paradigm shift, where large language models (LLMs) with billions or trillions of parameters are trained on vast datasets, achieving unprecedented success across a series of…

Computation and Language · Computer Science 2024-10-22 Anpeng Wu , Kun Kuang , Minqin Zhu , Yingrong Wang , Yujia Zheng , Kairong Han , Baohong Li , Guangyi Chen , Fei Wu , Kun Zhang

The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning, enhancing LLMs with excellent applicability and effectiveness in a…

Computation and Language · Computer Science 2024-06-19 Ruili Jiang , Kehai Chen , Xuefeng Bai , Zhixuan He , Juntao Li , Muyun Yang , Tiejun Zhao , Liqiang Nie , Min Zhang

Preference learning is critical for aligning large language models (LLMs) with human values, with the quality of preference datasets playing a crucial role in this process. While existing metrics primarily assess data quality based on…

Machine Learning · Computer Science 2025-03-05 Kexin Huang , Junkang Wu , Ziqian Chen , Xue Wang , Jinyang Gao , Bolin Ding , Jiancan Wu , Xiangnan He , Xiang Wang

Large Language Models (LLMs) exhibit remarkably powerful capabilities. One of the crucial factors to achieve success is aligning the LLM's output with human preferences. This alignment process often requires only a small amount of data to…

Aligning language models with human preferences through reinforcement learning from human feedback is crucial for their safe and effective deployment. The human preference is typically represented through comparison where one response is…

Machine Learning · Computer Science 2025-07-15 Hoang Anh Just , Ming Jin , Anit Sahu , Huy Phan , Ruoxi Jia

Humans use social context to specify preferences over behaviors, i.e. their reward functions. Yet, algorithms for inferring reward models from preference data do not take this social learning view into account. Inspired by pragmatic human…

Machine Learning · Computer Science 2024-05-24 Andi Peng , Yuying Sun , Tianmin Shu , David Abel

Large language models (LLMs) are increasingly deployed via public-facing interfaces to interact with millions of users, each with diverse preferences. Despite this, preference tuning of LLMs predominantly relies on reward models trained…

Computation and Language · Computer Science 2024-12-06 Vishakh Padmakumar , Chuanyang Jin , Hannah Rose Kirk , He He

Recent advances in Large Language Models (LLMs) highlight the need to align their behaviors with human values. A critical, yet understudied, issue is the potential divergence between an LLM's stated preferences (its reported alignment with…

Artificial Intelligence · Computer Science 2025-06-03 Zhuojun Gu , Quan Wang , Shuchu Han

Learning human preferences in language models remains fundamentally challenging, as reward modeling relies on subtle, subjective comparisons or shades of gray rather than clear-cut labels. This study investigates the limits of current…

Computation and Language · Computer Science 2026-04-03 Simona-Vasilica Oprea , Adela Bâra

Causal inference has been a pivotal challenge across diverse domains such as medicine and economics, demanding a complicated integration of human knowledge, mathematical reasoning, and data mining capabilities. Recent advancements in…

Computation and Language · Computer Science 2025-02-11 Jing Ma

Scaling laws have allowed Pre-trained Language Models (PLMs) into the field of causal reasoning. Causal reasoning of PLM relies solely on text-based descriptions, in contrast to causal discovery which aims to determine the causal…

Large foundation models pretrained on raw web-scale data are not readily deployable without additional step of extensive alignment to human preferences. Such alignment is typically done by collecting large amounts of pairwise comparisons…

Machine Learning · Computer Science 2024-06-13 Daiwei Chen , Yi Chen , Aniket Rege , Ramya Korlakai Vinayak

Pluralistic alignment has emerged as a critical frontier in the development of Large Language Models (LLMs), with reward models (RMs) serving as a central mechanism for capturing diverse human values. While benchmarks for general response…

Computation and Language · Computer Science 2026-04-09 Qiyao Ma , Dechen Gao , Rui Cai , Boqi Zhao , Hanchu Zhou , Junshan Zhang , Zhe Zhao

Alignment of large language models (LLMs) typically involves training a reward model on preference data, followed by policy optimization with respect to the reward model. However, optimizing policies with respect to a single reward model…

Machine Learning · Computer Science 2025-07-23 Debangshu Banerjee , Kintan Saha , Aditya Gopalan

Causal learning is the cognitive process of developing the capability of making causal inferences based on available information, often guided by normative principles. This process is prone to errors and biases, such as the illusion of…

Aligning large language models (LLMs) with human intentions has become a critical task for safely deploying models in real-world systems. While existing alignment approaches have seen empirical success, theoretically understanding how these…

Machine Learning · Computer Science 2024-08-08 Shawn Im , Yixuan Li

Aligning large language models (LLMs) to human preferences is challenging in domains where preference data is unavailable. We address the problem of learning reward models for such target domains by leveraging feedback collected from…

Machine Learning · Computer Science 2025-01-03 David Wu , Sanjiban Choudhury

Accurately aligning large language models (LLMs) with human preferences is crucial for informing fair, economically sound, and statistically efficient decision-making processes. However, we argue that the predominant approach for aligning…

Machine Learning · Statistics 2025-08-26 Jiancong Xiao , Ziniu Li , Xingyu Xie , Emily Getzen , Cong Fang , Qi Long , Weijie J. Su
‹ Prev 1 2 3 10 Next ›