Related papers: Submodular Reinforcement Learning

Scalable Submodular Policy Optimization via Pruned Submodularity Graph

In Reinforcement Learning (abbreviated as RL), an agent interacts with the environment via a set of possible actions, and a reward is generated from some unknown distribution. The task here is to find an optimal set of actions such that the…

Machine Learning · Computer Science 2025-07-21 Aditi Anand , Suman Banerjee , Dildar Ali

Global Reinforcement Learning: Beyond Linear and Convex Rewards via Submodular Semi-gradient Methods

In classic Reinforcement Learning (RL), the agent maximizes an additive objective of the visited states, e.g., a value function. Unfortunately, objectives of this type cannot model many real-world applications such as experiment design,…

Machine Learning · Computer Science 2024-07-16 Riccardo De Santi , Manish Prajapat , Andreas Krause

Learning Diverse Policies with Soft Self-Generated Guidance

Reinforcement learning (RL) with sparse and deceptive rewards is challenging because non-zero rewards are rarely obtained. Hence, the gradient calculated by the agent can be stochastic and without valid information. Recent studies that…

Machine Learning · Computer Science 2024-02-08 Guojian Wang , Faguo Wu , Xiao Zhang , Jianxiang Liu

Multi-Agent Reinforcement Learning with Submodular Reward

In this paper, we study cooperative multi-agent reinforcement learning (MARL) where the joint reward exhibits submodularity, which is a natural property capturing diminishing marginal returns when adding agents to a team. Unlike standard…

Machine Learning · Computer Science 2026-03-10 Wenjing Chen , Chengyuan Qian , Shuo Xing , Yi Zhou , Victoria Crawford

Average Reward Adjusted Discounted Reinforcement Learning: Near-Blackwell-Optimal Policies for Real-World Applications

Although in recent years reinforcement learning has become very popular the number of successful applications to different kinds of operations research problems is rather scarce. Reinforcement learning is based on the well-studied dynamic…

Machine Learning · Computer Science 2020-04-03 Manuel Schneckenreither

MOPO: Model-based Offline Policy Optimization

Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data. This problem setting offers the promise of utilizing such datasets to acquire policies without any…

Machine Learning · Computer Science 2020-11-24 Tianhe Yu , Garrett Thomas , Lantao Yu , Stefano Ermon , James Zou , Sergey Levine , Chelsea Finn , Tengyu Ma

Intrinsic Reward Policy Optimization for Sparse-Reward Environments

Exploration is essential in reinforcement learning as an agent relies on trial and error to learn an optimal policy. However, when rewards are sparse, naive exploration strategies, like noise injection, are often insufficient. Intrinsic…

Machine Learning · Computer Science 2026-01-30 Minjae Cho , Huy Trong Tran

Value-Free Policy Optimization via Reward Partitioning

Single-trajectory reinforcement learning (RL) methods aim to optimize policies from datasets consisting of (prompt, response, reward) triplets, where scalar rewards are directly available. This supervision format is highly practical, as it…

Machine Learning · Computer Science 2025-12-23 Bilal Faye , Hanane Azzag , Mustapha Lebbah

Trajectory-Oriented Policy Optimization with Sparse Rewards

Mastering deep reinforcement learning (DRL) proves challenging in tasks featuring scant rewards. These limited rewards merely signify whether the task is partially or entirely accomplished, necessitating various exploration actions before…

Machine Learning · Computer Science 2024-04-11 Guojian Wang , Faguo Wu , Xiao Zhang

Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards

Safe Reinforcement Learning (Safe RL) aims to train an RL agent to maximize its performance in real-world environments while adhering to safety constraints, as exceeding safety violation limits can result in severe consequences. In this…

Machine Learning · Computer Science 2025-04-07 Hanping Zhang , Yuhong Guo

Multi-Objective Reward and Preference Optimization: Theory and Algorithms

This thesis develops theoretical frameworks and algorithms that advance constrained reinforcement learning (RL) across control, preference learning, and alignment of large language models. The first contribution addresses constrained Markov…

Machine Learning · Computer Science 2025-12-12 Akhil Agnihotri

REBEL: Reinforcement Learning via Regressing Relative Rewards

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models.…

Machine Learning · Computer Science 2024-12-11 Zhaolin Gao , Jonathan D. Chang , Wenhao Zhan , Owen Oertell , Gokul Swamy , Kianté Brantley , Thorsten Joachims , J. Andrew Bagnell , Jason D. Lee , Wen Sun

Aligning Few-Step Diffusion Models with Dense Reward Difference Learning

Few-step diffusion models enable efficient high-resolution image synthesis but struggle to align with specific downstream objectives due to limitations of existing reinforcement learning (RL) methods in low-step regimes with limited state…

Machine Learning · Computer Science 2026-03-02 Ziyi Zhang , Li Shen , Sen Zhang , Deheng Ye , Yong Luo , Miaojing Shi , Dongjing Shan , Bo Du , Dacheng Tao

Beyond Optimism: Exploration With Partially Observable Rewards

Exploration in reinforcement learning (RL) remains an open challenge. RL algorithms rely on observing rewards to train the agent, and if informative rewards are sparse the agent learns slowly or may not learn at all. To improve exploration…

Machine Learning · Computer Science 2024-11-12 Simone Parisi , Alireza Kazemipour , Michael Bowling

Improving Policy Gradient by Exploring Under-appreciated Rewards

This paper presents a novel form of policy gradient for model-free reinforcement learning (RL) with improved exploration properties. Current policy-based methods use entropy regularization to encourage undirected exploration of the reward…

Machine Learning · Computer Science 2017-03-17 Ofir Nachum , Mohammad Norouzi , Dale Schuurmans

AdaMemento: Adaptive Memory-Assisted Policy Optimization for Reinforcement Learning

In sparse reward scenarios of reinforcement learning (RL), the memory mechanism provides promising shortcuts to policy optimization by reflecting on past experiences like humans. However, current memory-based RL methods simply store and…

Machine Learning · Computer Science 2026-05-28 Renye Yan , Yaozhong Gan , You Wu , Junliang Xing , Ling Liangn , Yeshang Zhu , Yimao Cai

Risk-sensitive Markov Decision Process and Learning under General Utility Functions

Reinforcement Learning (RL) has gained substantial attention across diverse application domains and theoretical investigations. Existing literature on RL theory largely focuses on risk-neutral settings where the decision-maker learns to…

Machine Learning · Computer Science 2024-12-24 Zhengqi Wu , Renyuan Xu

LambdaPO: A Lambda Style Policy Optimization for Reasoning Language Models

Group Relative Policy Optimization(GRPO) has become a cornerstone of modern reinforcement learning alignment, prized for its efficacy in foregoing an explicit value-critic by leveraging reward normalization across sampled trajectory…

Computation and Language · Computer Science 2026-05-29 Redacted by arXiv

CaRL: Learning Scalable Planning Policies with Simple Rewards

We investigate reinforcement learning (RL) for privileged planning in autonomous driving. State-of-the-art approaches for this task are rule-based, but these methods do not scale to the long tail. RL, on the other hand, is scalable and does…

Machine Learning · Computer Science 2025-08-22 Bernhard Jaeger , Daniel Dauner , Jens Beißwenger , Simon Gerstenecker , Kashyap Chitta , Andreas Geiger

Redeeming Intrinsic Rewards via Constrained Optimization

State-of-the-art reinforcement learning (RL) algorithms typically use random sampling (e.g., $\epsilon$-greedy) for exploration, but this method fails on hard exploration tasks like Montezuma's Revenge. To address the challenge of…

Machine Learning · Computer Science 2022-11-21 Eric Chen , Zhang-Wei Hong , Joni Pajarinen , Pulkit Agrawal