Related papers: Queueing Network Controls via Deep Reinforcement L…

Processing Network Controls via Deep Reinforcement Learning

Novel advanced policy gradient (APG) algorithms, such as proximal policy optimization (PPO), trust region policy optimization, and their variations, have become the dominant reinforcement learning (RL) algorithms because of their ease of…

Optimization and Control · Mathematics 2022-05-05 Mark Gluzman

A reinforcement learning approach to hybrid control design

In this paper we design hybrid control policies for hybrid systems whose mathematical models are unknown. Our contributions are threefold. First, we propose a framework for modelling the hybrid control design problem as a single Markov…

Systems and Control · Electrical Eng. & Systems 2020-09-03 Meet Gandhi , Atreyee Kundu , Shalabh Bhatnagar

Attention-Enhanced Prioritized Proximal Policy Optimization for Adaptive Edge Caching

This paper tackles the growing issue of excessive data transmission in networks. With increasing traffic, backhaul links and core networks are under significant traffic, leading to the investigation of caching solutions at edge routers.…

Networking and Internet Architecture · Computer Science 2024-10-31 Farnaz Niknia , Ping Wang , Zixu Wang , Aakash Agarwal , Adib S. Rezaei

Learning Optimal Admission Control in Partially Observable Queueing Networks

We present an efficient reinforcement learning algorithm that learns the optimal admission control policy in a partially observable queueing network. Specifically, only the arrival and departure times from the network are observable, and…

Machine Learning · Computer Science 2023-08-07 Jonatha Anselmi , Bruno Gaujal , Louis-Sébastien Rebuffi

A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes

The proximal policy optimization (PPO) algorithm stands as one of the most prosperous methods in the field of reinforcement learning (RL). Despite its success, the theoretical understanding of PPO remains deficient. Specifically, it is…

Machine Learning · Computer Science 2023-06-09 Han Zhong , Tong Zhang

Proximal Policy Optimization with Mixed Distributed Training

Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization (PPO) is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on…

Machine Learning · Computer Science 2019-10-01 Zhenyu Zhang , Xiangfeng Luo , Tong Liu , Shaorong Xie , Jianshu Wang , Wei Wang , Yang Li , Yan Peng

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

Deep Reinforcement Learning Approach to QoSAware Load Balancing in 5G Cellular Networks under User Mobility and Observation Uncertainty

Efficient mobility management and load balancing are critical to sustaining Quality of Service (QoS) in dense, highly dynamic 5G radio access networks. We present a deep reinforcement learning framework based on Proximal Policy Optimization…

Networking and Internet Architecture · Computer Science 2026-05-13 Mehrshad Eskandarpour , Hossein Soleimani

Empirical Evaluation of Policy-Based Reinforcement Learning for Dynamic Service Control in an M/M/1 Queue

While reinforcement learning has been increasingly applied to stochastic control, few studies have systematically examined policy-based methods in queuing environments modeled as a semi-Markov decision process (SMDP). To address this gap,…

Optimization and Control · Mathematics 2026-04-28 Joseph Walton , Gabriel Nicolosi

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.…

Machine Learning · Computer Science 2017-08-29 John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , Oleg Klimov

PTR-PPO: Proximal Policy Optimization with Prioritized Trajectory Replay

On-policy deep reinforcement learning algorithms have low data utilization and require significant experience for policy improvement. This paper proposes a proximal policy optimization algorithm with prioritized trajectory replay (PTR-PPO)…

Machine Learning · Computer Science 2021-12-09 Xingxing Liang , Yang Ma , Yanghe Feng , Zhong Liu

Lyapunov-based Safe Policy Optimization for Continuous Control

We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through safe policies, i.e.,~policies that do not take the agent to undesirable situations. We formulate…

Machine Learning · Computer Science 2019-02-13 Yinlam Chow , Ofir Nachum , Aleksandra Faust , Edgar Duenez-Guzman , Mohammad Ghavamzadeh

An Approximate Ascent Approach To Prove Convergence of PPO

Proximal Policy Optimization (PPO) is among the most widely used deep reinforcement learning algorithms, yet its theoretical foundations remain incomplete. Most importantly, convergence and understanding of fundamental PPO advantages remain…

Machine Learning · Computer Science 2026-02-04 Leif Doering , Daniel Schmidt , Moritz Melcher , Sebastian Kassing , Benedikt Wille , Tilman Aach , Simon Weissmann

Convergence of Natural Policy Gradient for a Family of Infinite-State Queueing MDPs

A wide variety of queueing systems can be naturally modeled as infinite-state Markov Decision Processes (MDPs). In the reinforcement learning (RL) context, a variety of algorithms have been developed to learn and optimize these MDPs. At the…

Machine Learning · Computer Science 2025-07-14 Isaac Grosof , Siva Theja Maguluri , R. Srikant

Multi-Objective Policy Gradients with Topological Constraints

Multi-objective optimization models that encode ordered sequential constraints provide a solution to model various challenging problems including encoding preferences, modeling a curriculum, and enforcing measures of safety. A recently…

Artificial Intelligence · Computer Science 2022-09-16 Kyle Hollins Wray , Stas Tiomkin , Mykel J. Kochenderfer , Pieter Abbeel

Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

Reinforcement learning (RL) has re-emerged as a natural approach for training interactive LLM agents in real-world environments. However, directly applying the widely used Group Relative Policy Optimization (GRPO) algorithm to multi-turn…

Machine Learning · Computer Science 2026-01-27 Junbo Li , Peng Zhou , Rui Meng , Meet P. Vadera , Lihong Li , Yang Li

Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes

To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice. Contrary to traditional RL algorithms…

Machine Learning · Computer Science 2021-08-24 Arghyadip Roy , Vivek Borkar , Abhay Karandikar , Prasanna Chaporkar

Minimizing the Outage Probability in a Markov Decision Process

Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain. We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is…

Machine Learning · Computer Science 2023-03-06 Vincent Corlay , Jean-Christophe Sibel

Matrix Low-Rank Trust Region Policy Optimization

Most methods in reinforcement learning use a Policy Gradient (PG) approach to learn a parametric stochastic policy that maps states to actions. The standard approach is to implement such a mapping via a neural network (NN) whose parameters…

Machine Learning · Computer Science 2024-05-29 Sergio Rozada , Antonio G. Marques

Multi-Objective Reward and Preference Optimization: Theory and Algorithms

This thesis develops theoretical frameworks and algorithms that advance constrained reinforcement learning (RL) across control, preference learning, and alignment of large language models. The first contribution addresses constrained Markov…

Machine Learning · Computer Science 2025-12-12 Akhil Agnihotri