Related papers: GROOT: Corrective Reward Optimization for Generati…

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences.…

Machine Learning · Computer Science 2023-12-04 Hanze Dong , Wei Xiong , Deepanshu Goyal , Yihan Zhang , Winnie Chow , Rui Pan , Shizhe Diao , Jipeng Zhang , Kashun Shum , Tong Zhang

Detecting and Suppressing Reward Hacking with Gradient Fingerprints

Reinforcement learning with verifiable rewards (RLVR) typically optimizes for outcome rewards without imposing constraints on intermediate reasoning. This leaves training susceptible to reward hacking, where models exploit loopholes (e.g.,…

Machine Learning · Computer Science 2026-04-20 Songtao Wang , Quang Hieu Pham , Fangcong Yin , Xinpeng Wang , Jocelyn Qiaochu Chen , Greg Durrett , Xi Ye

GRAM: A Generative Foundation Reward Model for Reward Generalization

In aligning large language models (LLMs), reward models have played an important role, but are standardly trained as discriminative models and rely only on labeled human preference data. In this paper, we explore methods that train reward…

Computation and Language · Computer Science 2026-01-27 Chenglong Wang , Yang Gan , Yifu Huo , Yongyu Mu , Qiaozhi He , Murun Yang , Bei Li , Tong Xiao , Chunliang Zhang , Tongran Liu , Jingbo Zhu

GRAM-R$^2$: Self-Training Generative Foundation Reward Models for Reward Reasoning

Significant progress in reward modeling over recent years has been driven by a paradigm shift from task-specific designs towards generalist reward models. Despite this trend, developing effective reward models remains a fundamental…

Computation and Language · Computer Science 2025-11-18 Chenglong Wang , Yongyu Mu , Hang Zhou , Yifu Huo , Ziming Zhu , Jiali Zeng , Murun Yang , Bei Li , Xiaoyang Hao , Chunliang Zhang , Fandong Meng , Jingbo Zhu , Tong Xiao

Generative Representational Instruction Tuning

All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is…

Computation and Language · Computer Science 2025-03-04 Niklas Muennighoff , Hongjin Su , Liang Wang , Nan Yang , Furu Wei , Tao Yu , Amanpreet Singh , Douwe Kiela

Exploring Question-Specific Rewards for Generating Deep Questions

Recent question generation (QG) approaches often utilize the sequence-to-sequence framework (Seq2Seq) to optimize the log-likelihood of ground-truth questions using teacher forcing. However, this training objective is inconsistent with…

Computation and Language · Computer Science 2020-11-03 Yuxi Xie , Liangming Pan , Dongzhe Wang , Min-Yen Kan , Yansong Feng

Putting the Horse Before the Cart:A Generator-Evaluator Framework for Question Generation from Text

Automatic question generation (QG) is a useful yet challenging task in NLP. Recent neural network-based approaches represent the state-of-the-art in this task. In this work, we attempt to strengthen them significantly by adopting a holistic…

Computation and Language · Computer Science 2019-09-17 Vishwajeet Kumar , Ganesh Ramakrishnan , Yuan-Fang Li

MemReward: Graph-Based Experience Memory for LLM Reward Prediction with Limited Labels

Reinforcement learning has emerged as a powerful paradigm for improving large language model (LLM) reasoning, where rollouts are sampled from the policy and reward signals computed on those rollouts are used to update the policy. However,…

Machine Learning · Computer Science 2026-05-25 Tianyang Luo , Tao Feng , Zhigang Hua , Yan Xie , Shuang Yang , Ge Liu , Jiaxuan You

Conditional set generation using Seq2seq models

Conditional set generation learns a mapping from an input sequence of tokens to a set. Several NLP tasks, such as entity typing and dialogue emotion tagging, are instances of set generation. Seq2Seq models, a popular choice for set…

Computation and Language · Computer Science 2022-10-25 Aman Madaan , Dheeraj Rajagopal , Niket Tandon , Yiming Yang , Antoine Bosselut

GROOT-2: Weakly Supervised Multi-Modal Instruction Following Agents

Developing agents that can follow multimodal instructions remains a fundamental challenge in robotics and AI. Although large-scale pre-training on unlabeled datasets (no language instruction) has enabled agents to learn diverse behaviors,…

Artificial Intelligence · Computer Science 2024-12-17 Shaofei Cai , Bowei Zhang , Zihao Wang , Haowei Lin , Xiaojian Ma , Anji Liu , Yitao Liang

Generative Verifiers: Reward Modeling as Next-Token Prediction

Verifiers or reward models are often used to enhance the reasoning performance of large language models (LLMs). A common approach is the Best-of-N method, where N candidate solutions generated by the LLM are ranked by a verifier, and the…

Machine Learning · Computer Science 2025-02-25 Lunjun Zhang , Arian Hosseini , Hritik Bansal , Mehran Kazemi , Aviral Kumar , Rishabh Agarwal

In-Loop Meta-Learning with Gradient-Alignment Reward

At the heart of the standard deep learning training loop is a greedy gradient step minimizing a given loss. We propose to add a second step to maximize training generalization. To do this, we optimize the loss of the next training step.…

Machine Learning · Computer Science 2021-02-08 Samuel Müller , André Biedenkapp , Frank Hutter

Augmented Natural Language for Generative Sequence Labeling

We propose a generative framework for joint sequence labeling and sentence-level classification. Our model performs multiple sequence labeling tasks at once using a single, shared natural language output space. Unlike prior discriminative…

Computation and Language · Computer Science 2020-09-29 Ben Athiwaratkun , Cicero Nogueira dos Santos , Jason Krone , Bing Xiang

BERT as a Teacher: Contextual Embeddings for Sequence-Level Reward

Measuring the quality of a generated sequence against a set of references is a central problem in many learning frameworks, be it to compute a score, to assign a reward, or to perform discrimination. Despite great advances in model…

Machine Learning · Computer Science 2020-03-06 Florian Schmidt , Thomas Hofmann

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

Recent works have demonstrated that using reinforcement learning (RL) with multiple quality rewards can improve the quality of generated images in text-to-image (T2I) generation. However, manually adjusting reward weights poses challenges…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Seung Hyun Lee , Yinxiao Li , Junjie Ke , Innfarn Yoo , Han Zhang , Jiahui Yu , Qifei Wang , Fei Deng , Glenn Entis , Junfeng He , Gang Li , Sangpil Kim , Irfan Essa , Feng Yang

Learning Generalizable Manipulation Policies with Object-Centric 3D Representations

We introduce GROOT, an imitation learning method for learning robust policies with object-centric and 3D priors. GROOT builds policies that generalize beyond their initial training conditions for vision-based manipulation. It constructs…

Robotics · Computer Science 2023-10-24 Yifeng Zhu , Zhenyu Jiang , Peter Stone , Yuke Zhu

Sequence Generation with Guider Network

Sequence generation with reinforcement learning (RL) has received significant attention recently. However, a challenge with such methods is the sparse-reward problem in the RL training process, in which a scalar guiding signal is often only…

Computation and Language · Computer Science 2018-11-05 Ruiyi Zhang , Changyou Chen , Zhe Gan , Wenlin Wang , Liqun Chen , Dinghan Shen , Guoyin Wang , Lawrence Carin

Selective Off-Policy Reference Tuning with Plan Guidance

Reinforcement learning with verifiable rewards helps reasoning, but GRPO-style methods stall on hard prompts where all sampled rollouts fail. SORT adds a repair update for those failures without changing rollout generation: it derives a…

Artificial Intelligence · Computer Science 2026-05-14 Duc Anh Le , Tien-Phat Nguyen , Thien Huu Nguyen , Linh Ngo Van , Trung Le

Generative Reasoning Re-ranker

Recent studies increasingly explore Large Language Models (LLMs) as a new paradigm for recommendation systems due to their scalability and world knowledge. However, existing work has three key limitations: (1) most efforts focus on…

Information Retrieval · Computer Science 2026-02-24 Mingfu Liang , Yufei Li , Jay Xu , Kavosh Asadi , Xi Liu , Shuo Gu , Kaushik Rangadurai , Frank Shyu , Shuaiwen Wang , Song Yang , Zhijing Li , Jiang Liu , Mengying Sun , Fei Tian , Xiaohan Wei , Chonglin Sun , Jacob Tao , Shike Mei , Wenlin Chen , Santanu Kolay , Sandeep Pandey , Hamed Firooz , Luke Simon

GraphGPT: Graph Instruction Tuning for Large Language Models

Graph Neural Networks (GNNs) have evolved to understand graph structures through recursive exchanges and aggregations among nodes. To enhance robustness, self-supervised learning (SSL) has become a vital tool for data augmentation.…

Computation and Language · Computer Science 2024-05-08 Jiabin Tang , Yuhao Yang , Wei Wei , Lei Shi , Lixin Su , Suqi Cheng , Dawei Yin , Chao Huang