English
Related papers

Related papers: Thresholded Lexicographic Ordered Multiobjective R…

200 papers

In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward…

Machine Learning · Computer Science 2022-12-29 Joar Skalse , Lewis Hammond , Charlie Griffin , Alessandro Abate

Lexicographic multi-objective problems, which consist of multiple conflicting subtasks with explicit priorities, are common in real-world applications. Despite the advantages of Reinforcement Learning (RL) in single tasks, extending…

Machine Learning · Computer Science 2025-11-12 Ruiyu Qiu , Rui Wang , Guanghui Yang , Xiang Li , Zhijiang Shao

In real-world decision optimization, often multiple competing objectives must be taken into account. Following classical reinforcement learning, these objectives have to be combined into a single reward function. In contrast,…

Machine Learning · Computer Science 2022-04-12 Johannes Dornheim

Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step.…

Finding optimal policies which maximize long term rewards of Markov Decision Processes requires the use of dynamic programming and backward induction to solve the Bellman optimality equation. However, many real-world problems require…

Machine Learning · Computer Science 2023-01-10 Mridul Agarwal , Vaneet Aggarwal

Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes…

Heuristic algorithms such as simulated annealing, Concorde, and METIS are effective and widely used approaches to find solutions to combinatorial optimization problems. However, they are limited by the high sample complexity required to…

Machine Learning · Computer Science 2019-06-18 Qingpeng Cai , Will Hang , Azalia Mirhoseini , George Tucker , Jingtao Wang , Wei Wei

Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to maximize the accumulated reward, it often learns to exploit loopholes and misspecifications in the reward signal resulting in unwanted behavior. While…

Machine Learning · Computer Science 2018-12-27 Chen Tessler , Daniel J. Mankowitz , Shie Mannor

To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice. Contrary to traditional RL algorithms…

Machine Learning · Computer Science 2021-08-24 Arghyadip Roy , Vivek Borkar , Abhay Karandikar , Prasanna Chaporkar

Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied…

Machine Learning · Computer Science 2024-05-06 Zhongchang Sun , Sihong He , Fei Miao , Shaofeng Zou

A key challenge in applying reinforcement learning (RL) to diffusion large language models (dLLMs) lies in the intractability of their likelihood functions, which are essential for the RL objective, necessitating corresponding approximation…

Machine Learning · Computer Science 2025-10-15 Nianyi Lin , Jiajie Zhang , Lei Hou , Juanzi Li

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact…

Machine Learning · Computer Science 2017-05-31 Joshua Achiam , David Held , Aviv Tamar , Pieter Abbeel

Improving the multi-step reasoning ability of large language models (LLMs) with offline reinforcement learning (RL) is essential for quickly adapting them to complex tasks. While Direct Preference Optimization (DPO) has shown promise in…

Machine Learning · Computer Science 2024-12-30 Huaijie Wang , Shibo Hao , Hanze Dong , Shenao Zhang , Yilin Bao , Ziran Yang , Yi Wu

An important challenge in multi-objective reinforcement learning is obtaining a Pareto front of policies to attain optimal performance under different preferences. We introduce Iterated Pareto Referent Optimisation (IPRO), which decomposes…

Machine Learning · Computer Science 2025-02-07 Willem Röpke , Mathieu Reymond , Patrick Mannion , Diederik M. Roijers , Ann Nowé , Roxana Rădulescu

Multi-Objective Reinforcement Learning (MORL) presents significant challenges and opportunities for optimizing multiple objectives in Large Language Models (LLMs). We introduce a MORL taxonomy and examine the advantages and limitations of…

Computation and Language · Computer Science 2025-09-29 Lingxiao Kong , Cong Yang , Oya Deniz Beyan , Zeyd Boukhers

Multimodal Large Language Models (MLLMs) excel in vision-language reasoning but often struggle with structured perception tasks requiring precise localization and robustness. We propose a reinforcement learning framework that augments Group…

Computer Vision and Pattern Recognition · Computer Science 2025-10-08 Xu Jia

Blind Face Restoration (BFR) encounters inherent challenges in exploring its large solution space, leading to common artifacts like missing details and identity ambiguity in the restored images. To tackle these challenges, we propose a…

Computer Vision and Pattern Recognition · Computer Science 2025-12-22 Bin Wu , Yahui Liu , Chi Zhang , Yao Zhao , Wei Wang

Reinforcement learning (RL) has emerged as a powerful approach for tackling complex problems. The recent introduction of multi-objective reinforcement learning (MORL) has further expanded the scope of RL by enabling agents to make…

Machine Learning · Computer Science 2023-10-26 Florian Felten , Daniel Gareev , El-Ghazali Talbi , Grégoire Danoy

Reinforcement Learning (RL) has achieved state-of-the-art results in domains such as robotics and games. We build on this previous work by applying RL algorithms to a selection of canonical online stochastic optimization problems with a…

Reinforcement learning with verifiable rewards (RLVR) has become a standard approach for large language models (LLMs) post-training to incentivize reasoning capacity. Among existing recipes, group-based policy gradient is prevalent, which…

‹ Prev 1 2 3 10 Next ›