English
Related papers

Related papers: RLHFless: Serverless Computing for Efficient RLHF

200 papers

Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node represents computation of a neural network (NN) and each edge denotes…

Machine Learning · Computer Science 2024-10-03 Guangming Sheng , Chi Zhang , Zilingfeng Ye , Xibin Wu , Wang Zhang , Ru Zhang , Yanghua Peng , Haibin Lin , Chuan Wu

Reinforcement learning from human feedback (RLHF) has emerged as a central framework for aligning large language models (LLMs) with human preferences. Despite its practical success, RLHF raises fundamental statistical questions because it…

Machine Learning · Statistics 2026-04-06 Pangpang Liu , Chengchun Shi , Will Wei Sun

We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model…

Machine Learning · Computer Science 2024-11-13 Hanze Dong , Wei Xiong , Bo Pang , Haoxiang Wang , Han Zhao , Yingbo Zhou , Nan Jiang , Doyen Sahoo , Caiming Xiong , Tong Zhang

Recent advancements in Large Language Models (LLMs) have garnered wide attention and led to successful products such as ChatGPT and GPT-4. Their proficiency in adhering to instructions and delivering harmless, helpful, and honest (3H)…

Machine Learning · Computer Science 2023-10-11 Hao Sun

Reinforcement Learning from Human Feedback (RLHF) has shown remarkable success in aligning Large Language Models (LLMs) with human preferences. Traditional RLHF methods rely on a fixed dataset, which often suffers from limited coverage. To…

Machine Learning · Computer Science 2025-10-28 Long-Fei Li , Yu-Yang Qian , Peng Zhao , Zhi-Hua Zhou

State-of-the-art large language models (LLMs) have become indispensable tools for various tasks. However, training LLMs to serve as effective assistants for humans requires careful consideration. A promising approach is reinforcement…

This study explores the scaling properties of Reinforcement Learning from Human Feedback (RLHF) in Large Language Models (LLMs). Although RLHF is considered an important step in post-training of LLMs, its scaling potential is still largely…

Computation and Language · Computer Science 2024-12-10 Zhenyu Hou , Pengfan Du , Yilin Niu , Zhengxiao Du , Aohan Zeng , Xiao Liu , Minlie Huang , Hongning Wang , Jie Tang , Yuxiao Dong

Large Language Models (LLMs) fine-tuned via Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning with Verifiable Rewards (RLVR) significantly improve the alignment of human-AI values, further raising the upper bound…

Artificial Intelligence · Computer Science 2025-10-10 Jian Hu , Xibin Wu , Wei Shen , Jason Klein Liu , Zilin Zhu , Weixun Wang , Songlin Jiang , Haoran Wang , Hao Chen , Bin Chen , Weikai Fang , Xianyu , Yu Cao , Haotian Xu , Yiming Liu

While Reinforcement Learning from Human Feedback (RLHF) effectively aligns pretrained Large Language and Vision-Language Models (LLMs, and VLMs) with human preferences, its computational cost and complexity hamper its wider adoption. To…

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of…

Machine Learning · Computer Science 2025-12-30 Timo Kaufmann , Paul Weng , Viktor Bengs , Eyke Hüllermeier

Reinforcement learning from human feedback (RLHF) has demonstrated effectiveness in aligning large language models (LLMs) with human preferences. However, token-level RLHF suffers from the credit assignment problem over long sequences,…

Computation and Language · Computer Science 2025-02-18 Yekun Chai , Haoran Sun , Huang Fang , Shuohuan Wang , Yu Sun , Hua Wu

Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique for empowering large language model (LLM) applications. Compared with the supervised training process of LLMs, the RLHF training process is much more sophisticated,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-25 Zhiyu Mei , Wei Fu , Kaiwei Li , Guangju Wang , Huanchen Zhang , Yi Wu

Reinforcement Learning from Human Feedback (\textbf{RLHF}) has emerged as a dominant approach for aligning LLM outputs with human preferences. Inspired by the success of RLHF, we study the performance of multiple algorithms that learn from…

We apply preference modeling and reinforcement learning from human feedback (RLHF) to finetune language models to act as helpful and harmless assistants. We find this alignment training improves performance on almost all NLP evaluations,…

Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique to make large language models (LLMs) more capable in complex settings. RLHF proceeds as collecting human preference data, training a reward model on said…

Machine Learning · Computer Science 2024-02-05 Nathan Lambert , Roberto Calandra

Reinforcement Learning from Human Feedback (RLHF) has become an increasingly popular paradigm for training large language models (LLMs) and diffusion models. While existing RLHF training systems have enabled significant progress, they often…

Machine Learning · Computer Science 2025-08-01 Junyu Wu , Weiming Chang , Xiaotao Liu , Guanyou He , Haoqiang Hong , Boqi Liu , Hongtao Tian , Tao Yang , Yunsheng Shi , Feng Lin , Ting Yao

We present RLHFuse, an efficient training system with stage fusion for Reinforcement Learning from Human Feedback (RLHF). Due to the intrinsic nature of RLHF training, i.e., the data skewness in the generation stage and the pipeline bubbles…

Machine Learning · Computer Science 2025-04-23 Yinmin Zhong , Zili Zhang , Bingyang Wu , Shengyu Liu , Yukun Chen , Changyi Wan , Hanpeng Hu , Lei Xia , Ranchen Ming , Yibo Zhu , Xin Jin

With the development of large language models (LLMs), striking a balance between the performance and safety of AI systems has never been more critical. However, the inherent tension between the objectives of helpfulness and harmlessness…

Artificial Intelligence · Computer Science 2023-10-20 Josef Dai , Xuehai Pan , Ruiyang Sun , Jiaming Ji , Xinbo Xu , Mickel Liu , Yizhou Wang , Yaodong Yang

Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models with human preferences, significantly enhancing the quality of interactions between humans and models. InstructGPT implements RLHF through…

Computation and Language · Computer Science 2023-10-10 Zheng Yuan , Hongyi Yuan , Chuanqi Tan , Wei Wang , Songfang Huang , Fei Huang

While large language models demonstrate remarkable capabilities, they often present challenges in terms of safety, alignment with human values, and stability during training. Here, we focus on two prevalent methods used to align these…

Computation and Language · Computer Science 2023-10-26 Gabriel Mukobi , Peter Chatain , Su Fong , Robert Windesheim , Gitta Kutyniok , Kush Bhatia , Silas Alberti
‹ Prev 1 2 3 10 Next ›