Related papers: Thresholded Lexicographic Ordered Multiobjective R…

Lexicographic Multi-Objective Reinforcement Learning

In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward…

Machine Learning · Computer Science 2022-12-29 Joar Skalse , Lewis Hammond , Charlie Griffin , Alessandro Abate

LPPG-RL: Lexicographically Projected Policy Gradient Reinforcement Learning with Subproblem Exploration

Lexicographic multi-objective problems, which consist of multiple conflicting subtasks with explicit priorities, are common in real-world applications. Despite the advantages of Reinforcement Learning (RL) in single tasks, extending…

Machine Learning · Computer Science 2025-11-12 Ruiyu Qiu , Rui Wang , Guanghui Yang , Xiang Li , Zhijiang Shao

gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning Approach

In real-world decision optimization, often multiple competing objectives must be taken into account. Following classical reinforcement learning, these objectives have to be combined into a single reward function. In contrast,…

Machine Learning · Computer Science 2022-04-12 Johannes Dornheim

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning

Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step.…

Machine Learning · Computer Science 2023-08-02 Abbas Abdolmaleki , Sandy H. Huang , Giulia Vezzani , Bobak Shahriari , Jost Tobias Springenberg , Shruti Mishra , Dhruva TB , Arunkumar Byravan , Konstantinos Bousmalis , Andras Gyorgy , Csaba Szepesvari , Raia Hadsell , Nicolas Heess , Martin Riedmiller

Reinforcement Learning for Joint Optimization of Multiple Rewards

Finding optimal policies which maximize long term rewards of Markov Decision Processes requires the use of dynamic programming and backward induction to solve the Bellman optimality equation. However, many real-world problems require…

Machine Learning · Computer Science 2023-01-10 Mridul Agarwal , Vaneet Aggarwal

A Practical Guide to Multi-Objective Reinforcement Learning and Planning

Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes…

Artificial Intelligence · Computer Science 2022-04-22 Conor F. Hayes , Roxana Rădulescu , Eugenio Bargiacchi , Johan Källström , Matthew Macfarlane , Mathieu Reymond , Timothy Verstraeten , Luisa M. Zintgraf , Richard Dazeley , Fredrik Heintz , Enda Howley , Athirai A. Irissappane , Patrick Mannion , Ann Nowé , Gabriel Ramos , Marcello Restelli , Peter Vamplew , Diederik M. Roijers

Reinforcement Learning Driven Heuristic Optimization

Heuristic algorithms such as simulated annealing, Concorde, and METIS are effective and widely used approaches to find solutions to combinatorial optimization problems. However, they are limited by the high sample complexity required to…

Machine Learning · Computer Science 2019-06-18 Qingpeng Cai , Will Hang , Azalia Mirhoseini , George Tucker , Jingtao Wang , Wei Wei

Reward Constrained Policy Optimization

Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to maximize the accumulated reward, it often learns to exploit loopholes and misspecifications in the reward signal resulting in unwanted behavior. While…

Machine Learning · Computer Science 2018-12-27 Chen Tessler , Daniel J. Mankowitz , Shie Mannor

Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes

To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice. Contrary to traditional RL algorithms…

Machine Learning · Computer Science 2021-08-24 Arghyadip Roy , Vivek Borkar , Abhay Karandikar , Prasanna Chaporkar

Constrained Reinforcement Learning Under Model Mismatch

Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied…

Machine Learning · Computer Science 2024-05-06 Zhongchang Sun , Sihong He , Fei Miao , Shaofeng Zou

Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models

A key challenge in applying reinforcement learning (RL) to diffusion large language models (dLLMs) lies in the intractability of their likelihood functions, which are essential for the RL objective, necessitating corresponding approximation…

Machine Learning · Computer Science 2025-10-15 Nianyi Lin , Jiajie Zhang , Lei Hou , Juanzi Li

Constrained Policy Optimization

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact…

Machine Learning · Computer Science 2017-05-31 Joshua Achiam , David Held , Aviv Tamar , Pieter Abbeel

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Improving the multi-step reasoning ability of large language models (LLMs) with offline reinforcement learning (RL) is essential for quickly adapting them to complex tasks. While Direct Preference Optimization (DPO) has shown promise in…

Machine Learning · Computer Science 2024-12-30 Huaijie Wang , Shibo Hao , Hanze Dong , Shenao Zhang , Yilin Bao , Ziran Yang , Yi Wu

Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning

An important challenge in multi-objective reinforcement learning is obtaining a Pareto front of policies to attain optimal performance under different preferences. We introduce Iterated Pareto Referent Optimisation (IPRO), which decomposes…

Machine Learning · Computer Science 2025-02-07 Willem Röpke , Mathieu Reymond , Patrick Mannion , Diederik M. Roijers , Ann Nowé , Roxana Rădulescu

Multi-Objective Reinforcement Learning for Large Language Model Optimization: Visionary Perspective

Multi-Objective Reinforcement Learning (MORL) presents significant challenges and opportunities for optimizing multiple objectives in Large Language Models (LLMs). We introduce a MORL taxonomy and examine the advantages and limitations of…

Computation and Language · Computer Science 2025-09-29 Lingxiao Kong , Cong Yang , Oya Deniz Beyan , Zeyd Boukhers

Robust Object Detection for Autonomous Driving via Curriculum-Guided Group Relative Policy Optimization

Multimodal Large Language Models (MLLMs) excel in vision-language reasoning but often struggle with structured perception tasks requiring precise localization and robustness. We propose a reinforcement learning framework that augments Group…

Computer Vision and Pattern Recognition · Computer Science 2025-10-08 Xu Jia

Enhancing Blind Face Restoration through Online Reinforcement Learning

Blind Face Restoration (BFR) encounters inherent challenges in exploring its large solution space, leading to common artifacts like missing details and identity ambiguity in the restored images. To tackle these challenges, we propose a…

Computer Vision and Pattern Recognition · Computer Science 2025-12-22 Bin Wu , Yahui Liu , Chi Zhang , Yao Zhao , Wei Wang

Hyperparameter Optimization for Multi-Objective Reinforcement Learning

Reinforcement learning (RL) has emerged as a powerful approach for tackling complex problems. The recent introduction of multi-objective reinforcement learning (MORL) has further expanded the scope of RL by enabling agents to make…

Machine Learning · Computer Science 2023-10-26 Florian Felten , Daniel Gareev , El-Ghazali Talbi , Grégoire Danoy

ORL: Reinforcement Learning Benchmarks for Online Stochastic Optimization Problems

Reinforcement Learning (RL) has achieved state-of-the-art results in domains such as robotics and games. We build on this previous work by applying RL algorithms to a selection of canonical online stochastic optimization problems with a…

Machine Learning · Computer Science 2019-12-03 Bharathan Balaji , Jordan Bell-Masterson , Enes Bilgin , Andreas Damianou , Pablo Moreno Garcia , Arpit Jain , Runfei Luo , Alvaro Maggiar , Balakrishnan Narayanaswamy , Chun Ye

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex

Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reasoning capacity. Among existing recipes, group-based policy gradient is prevalent, which…

Machine Learning · Computer Science 2026-05-21 Yun Qu , Qi Wang , Yixiu Mao , Heming Zou , Yuhang Jiang , Yingyue Li , Wutong Xu , Lizhou Cai , Weijie Liu , Clive Bai , Kai Yang , Yangkun Chen , Saiyong Yang , Xiangyang Ji