Related papers: InstructEngine: Instruction-driven Text-to-Image A…

Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences

One-step text-to-image generator models offer advantages such as swift inference efficiency, flexible architectures, and state-of-the-art generation performance. In this paper, we study the problem of aligning one-step generator models with…

Computer Vision and Pattern Recognition · Computer Science 2025-06-06 Weijian Luo

Safer-Instruct: Aligning Language Models with Automated Preference Data

Reinforcement learning from human feedback (RLHF) is a vital strategy for enhancing model capability in language models. However, annotating preference data for RLHF is a resource-intensive and creativity-demanding process, while existing…

Computation and Language · Computer Science 2024-04-02 Taiwei Shi , Kai Chen , Jieyu Zhao

RLTHF: Targeted Human Feedback for LLM Alignment

Fine-tuning large language models (LLMs) to align with user preferences is challenging due to the high cost of quality human annotations in Reinforcement Learning from Human Feedback (RLHF) and the generalizability limitations of AI…

Computation and Language · Computer Science 2025-08-08 Yifei Xu , Tusher Chakraborty , Emre Kıcıman , Bibek Aryal , Eduardo Rodrigues , Srinagesh Sharma , Roberto Estevao , Maria Angels de Luis Balaguer , Jessica Wolk , Rafael Padilha , Leonardo Nunes , Shobana Balakrishnan , Songwu Lu , Ranveer Chandra

Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

Aligning human preference and value is an important requirement for contemporary foundation models. State-of-the-art techniques such as Reinforcement Learning from Human Feedback (RLHF) often consist of two stages: 1) supervised fine-tuning…

Artificial Intelligence · Computer Science 2024-10-29 Jiaxiang Li , Siliang Zeng , Hoi-To Wai , Chenliang Li , Alfredo Garcia , Mingyi Hong

Maximizing the efficiency of human feedback in AI alignment: a comparative analysis

Reinforcement Learning from Human Feedback (RLHF) relies on preference modeling to align machine learning systems with human values, yet the popular approach of random pair sampling with Bradley-Terry modeling is statistically limited and…

Human-Computer Interaction · Computer Science 2025-12-02 Andreas Chouliaras , Dimitris Chatzopoulos

MapReduce LoRA: Advancing the Pareto Front in Multi-Preference Optimization for Generative Models

Reinforcement learning from human feedback (RLHF) with reward models has advanced alignment of generative models to human aesthetic and perceptual preferences. However, jointly optimizing multiple rewards often incurs an alignment tax,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Chieh-Yun Chen , Zhonghao Wang , Qi Chen , Zhifan Ye , Min Shi , Yue Zhao , Yinan Zhao , Hui Qu , Wei-An Lin , Yiru Shen , Ajinkya Kale , Irfan Essa , Humphrey Shi

Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment

Aligning human preference and value is an important requirement for building contemporary foundation models and embodied AI. However, popular approaches such as reinforcement learning with human feedback (RLHF) break down the task into…

Artificial Intelligence · Computer Science 2024-12-03 Chenliang Li , Siliang Zeng , Zeyi Liao , Jiaxiang Li , Dongyeop Kang , Alfredo Garcia , Mingyi Hong

RRHF: Rank Responses to Align Language Models with Human Feedback without tears

Reinforcement Learning from Human Feedback (RLHF) facilitates the alignment of large language models with human preferences, significantly enhancing the quality of interactions between humans and models. InstructGPT implements RLHF through…

Computation and Language · Computer Science 2023-10-10 Zheng Yuan , Hongyi Yuan , Chuanqi Tan , Wei Wang , Songfang Huang , Fei Huang

EditHF-1M: A Million-Scale Rich Human Preference Feedback for Image Editing

Recent text-guided image editing (TIE) models have achieved remarkable progress, while many edited images still suffer from issues such as artifacts, unexpected editings, unaesthetic contents. Although some benchmarks and methods have been…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Zitong Xu , Huiyu Duan , Zhongpeng Ji , Xinyun Zhang , Yutao Liu , Xiongkuo Min , Ke Gu , Jian Zhang , Shusong Xu , Jinwei Chen , Bo Li , Guangtao Zhai

ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation

We present a comprehensive solution to learn and improve text-to-image models from human preference feedback. To begin with, we build ImageReward -- the first general-purpose text-to-image human preference reward model -- to effectively…

Computer Vision and Pattern Recognition · Computer Science 2023-12-29 Jiazheng Xu , Xiao Liu , Yuchen Wu , Yuxuan Tong , Qinkai Li , Ming Ding , Jie Tang , Yuxiao Dong

Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

Reinforcement Learning from Human Feedback (RLHF) is a pivotal technique that aligns language models closely with human-centric values. The initial phase of RLHF involves learning human values using a reward model from ranking data. It is…

Machine Learning · Computer Science 2024-01-30 Banghua Zhu , Michael I. Jordan , Jiantao Jiao

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback

The success of AI assistants based on Language Models (LLMs) hinges on Reinforcement Learning from Human Feedback (RLHF) to comprehend and align with user intentions. However, traditional alignment algorithms, such as PPO, are hampered by…

Computation and Language · Computer Science 2024-07-03 Songyang Gao , Qiming Ge , Wei Shen , Shihan Dou , Junjie Ye , Xiao Wang , Rui Zheng , Yicheng Zou , Zhi Chen , Hang Yan , Qi Zhang , Dahua Lin

ClaHF: A Human Feedback-inspired Reinforcement Learning Framework for Improving Classification Tasks

Text classification models are typically trained via supervised fine-tuning (SFT). However, SFT essentially performs behavior cloning from instance-wise labels and thus fails to adequately capture relative preference relations among…

Machine Learning · Computer Science 2026-05-19 Tianxiang Xu , Xiaoyan Zhu , Xin Lai , Jiayin Wang

Towards Understanding the Influence of Reward Margin on Preference Model Performance

Reinforcement Learning from Human Feedback (RLHF) is a widely used framework for the training of language models. However, the process of using RLHF to develop a language model that is well-aligned presents challenges, especially when it…

Computation and Language · Computer Science 2024-04-09 Bowen Qin , Duanyu Feng , Xi Yang

T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation

The rapid progress in diffusion-based text-to-image (T2I) generation has created an urgent need for interpretable automatic evaluation methods that can assess the quality of generated images, therefore reducing the human annotation burden.…

Artificial Intelligence · Computer Science 2025-05-26 Zi-Ao Ma , Tian Lan , Rong-Cheng Tu , Shu-Hang Liu , Heyan Huang , Zhijing Wu , Chen Xu , Xian-Ling Mao

MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions

Reinforcement learning from human feedback (RLHF) has demonstrated effectiveness in aligning large language models (LLMs) with human preferences. However, token-level RLHF suffers from the credit assignment problem over long sequences,…

Computation and Language · Computer Science 2025-02-18 Yekun Chai , Haoran Sun , Huang Fang , Shuohuan Wang , Yu Sun , Hua Wu

Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation

Recent studies have demonstrated the exceptional potentials of leveraging human preference datasets to refine text-to-image generative models, enhancing the alignment between generated images and textual prompts. Despite these advances,…

Computer Vision and Pattern Recognition · Computer Science 2024-04-24 Xun Wu , Shaohan Huang , Furu Wei

Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback

Reinforcement learning from human feedback (RLHF) has proven effective in enhancing the instruction-following capabilities of large language models; however, it remains underexplored in the cross-modality domain. As the number of modalities…

Artificial Intelligence · Computer Science 2024-12-31 Jiaming Ji , Jiayi Zhou , Hantao Lou , Boyuan Chen , Donghai Hong , Xuyao Wang , Wenqi Chen , Kaile Wang , Rui Pan , Jiahao Li , Mohan Wang , Josef Dai , Tianyi Qiu , Hua Xu , Dong Li , Weipeng Chen , Jun Song , Bo Zheng , Yaodong Yang

Fine-tuning Language Models with Generative Adversarial Reward Modelling

Reinforcement Learning with Human Feedback (RLHF) has been demonstrated to significantly enhance the performance of large language models (LLMs) by aligning their outputs with desired human values through instruction tuning. However, RLHF…

Computation and Language · Computer Science 2024-03-06 Zhang Ze Yu , Lau Jia Jaw , Zhang Hui , Bryan Kian Hsiang Low

Optimizing Prompts for Text-to-Image Generation

Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant prompts are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation,…

Computation and Language · Computer Science 2024-01-01 Yaru Hao , Zewen Chi , Li Dong , Furu Wei