English
Related papers

Related papers: REFA: Reference Free Alignment for multi-preferenc…

200 papers

Standard human preference-based alignment methods, such as Reinforcement Learning from Human Feedback (RLHF), are a cornerstone for aligning large language models (LLMs) with human values. However, these methods typically assume that…

Artificial Intelligence · Computer Science 2026-03-02 Xiaoyang Cao , Zelai Xu , Mo Guang , Kaiwen Long , Michiel A. Bakker , Yu Wang , Chao Yu

Aligning language models with human preferences is crucial for reducing errors and biases in these models. Alignment techniques, such as reinforcement learning from human feedback (RLHF), are typically cast as optimizing a tradeoff between…

Aligning large language models (LLMs) with human preferences is critical for real-world deployment, yet existing methods like RLHF face computational and stability challenges. While DPO establishes an offline paradigm with single…

Machine Learning · Computer Science 2025-10-28 Junkang Wu , Kexin Huang , Xue Wang , Jinyang Gao , Bolin Ding , Jiancan Wu , Xiangnan He , Xiang Wang

Reinforcement learning from human feedback (RLHF) has been crucial in aligning large language models (LLMs) with human values. Traditionally, RLHF involves generating responses to a query and using a reward model to assign a reward to the…

Computation and Language · Computer Science 2024-12-04 Wenxuan Zhou , Shujian Zhang , Lingxiao Zhao , Tao Meng

Recently, Large Language Models (LLMs) have rapidly evolved, approaching Artificial General Intelligence (AGI) while benefiting from large-scale reinforcement learning to enhance Human Alignment (HA) and Reasoning. Recent reward-based…

Machine Learning · Computer Science 2025-06-19 Xuerui Su , Shufang Xie , Guoqing Liu , Yingce Xia , Renqian Luo , Peiran Jin , Zhiming Ma , Yue Wang , Zun Wang , Yuting Liu

Direct alignment algorithms (DAAs), such as direct preference optimization (DPO), have become popular alternatives for Reinforcement Learning from Human Feedback (RLHF) due to their simplicity, efficiency, and stability. However, the…

Machine Learning · Computer Science 2024-10-15 Jongwoo Ko , Saket Dingliwal , Bhavana Ganesh , Sailik Sengupta , Sravan Bodapati , Aram Galstyan

Reinforcement Learning (RL) in environments with complex, history-dependent reward structures poses significant challenges for traditional methods. In this work, we introduce a novel approach that leverages automaton-based feedback to guide…

Machine Learning · Computer Science 2025-10-20 Mahyar Alinejad , Alvaro Velasquez , Yue Wang , George Atia

Instruction following (IF) is a critical capability for large language models (LLMs). However, handling complex instructions with multiple constraints remains challenging. Previous methods typically select preference pairs based on the…

Computation and Language · Computer Science 2025-05-29 Xiang Huang , Ting-En Lin , Feiteng Fang , Yuchuan Wu , Hangyu Li , Yuzhong Qu , Fei Huang , Yongbin Li

Direct Preference Optimization (DPO) and its variants have become the de facto standards for aligning large language models (LLMs) with human preferences or specific goals. However, DPO requires high-quality preference data and suffers from…

Machine Learning · Computer Science 2024-11-12 Zhuotong Chen , Fang Liu , Jennifer Zhu , Wanyu Du , Yanjun Qi

Preference-based alignment like Reinforcement Learning from Human Feedback (RLHF) learns from pairwise preferences, yet the labels are often noisy and inconsistent. Existing uncertainty-aware approaches weight preferences, but ignore a more…

Machine Learning · Computer Science 2026-01-27 Tiejin Chen , Xiaoou Liu , Vishnu Nandam , Kuan-Ru Liou , Hua Wei

Aligning language models to human expectations, e.g., being helpful and harmless, has become a pressing challenge for large language models. A typical alignment procedure consists of supervised fine-tuning and preference learning. Most…

Machine Learning · Computer Science 2024-02-27 Tianchi Cai , Xierui Song , Jiyan Jiang , Fei Teng , Jinjie Gu , Guannan Zhang

Self-play alignment has emerged as an effective approach for fine-tuning large language models (LLMs), formulating preference optimization as a two-player game. However, the regularization with respect to the reference policy, which is…

Machine Learning · Computer Science 2025-07-09 Xiaohang Tang , Sangwoong Yoon , Seongho Son , Huizhuo Yuan , Quanquan Gu , Ilija Bogunovic

Reinforcement Learning from Human Feedback (RLHF) has proven effective in aligning large language models with human intentions, yet it often relies on complex methodologies like Proximal Policy Optimization (PPO) that require extensive…

Computation and Language · Computer Science 2024-08-30 Han Xia , Songyang Gao , Qiming Ge , Zhiheng Xi , Qi Zhang , Xuanjing Huang

Recent advancements in large language model alignment leverage token-level supervisions to perform fine-grained preference optimization. However, existing token-level alignment methods either optimize on all available tokens, which can be…

Computation and Language · Computer Science 2025-11-07 Kailai Yang , Zhiwei Liu , Qianqian Xie , Jimin Huang , Erxue Min , Sophia Ananiadou

Reinforcement Learning from Human Feedback (RLHF) has advanced alignment capabilities significantly but remains hindered by two core challenges: \textbf{reward hacking} and \textbf{stable optimization}. Current solutions independently…

Machine Learning · Computer Science 2026-02-13 Li He , Qiang Qu , He Zhao , Stephen Wan , Dadong Wang , Lina Yao , Tongliang Liu

The effectiveness of reinforcement learning (RL) agents in continuous control robotics tasks is mainly dependent on the design of the underlying reward function, which is highly prone to reward hacking. A misalignment between the reward…

Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone for aligning large language models (LLMs) with human values. However, existing approaches struggle to capture the multi-dimensional, distributional nuances of human…

Computation and Language · Computer Science 2025-05-20 Zelei Cheng , Xin-Qiang Cai , Yuting Tang , Pushi Zhang , Boming Yang , Masashi Sugiyama , Xinyu Xing

Feedback Alignment (FA) methods are biologically inspired local learning rules for training neural networks with reduced communication between layers. While FA has potential applications in distributed and privacy-aware ML, limitations in…

Machine Learning · Computer Science 2024-06-05 Zachary Robertson , Oluwasanmi Koyejo

Low-Rank Adaptation (LoRA) has become one of the most widely used fine-tuning mechanisms for adapting large language models to new domains, tasks, and users. Yet adaptation performance alone can obscure an important failure mode: LoRA…

Computation and Language · Computer Science 2026-05-29 Runze Xu , Arpit Garg , Hemanth Saratchandran , Simon Lucey

Rerankers play a pivotal role in refining retrieval results for Retrieval-Augmented Generation. However, current reranking models are typically optimized on static human annotated relevance labels in isolation, decoupled from the downstream…

Computation and Language · Computer Science 2026-04-03 Yuhang Wu , Xiangqing Shen , Fanfan Wang , Cangqi Zhou , Zhen Wu , Xinyu Dai , Rui Xia
‹ Prev 1 2 3 10 Next ›