Related papers: Processing Network Controls via Deep Reinforcement…

Queueing Network Controls via Deep Reinforcement Learning

Novel advanced policy gradient (APG) methods, such as Trust Region policy optimization and Proximal policy optimization (PPO), have become the dominant reinforcement learning algorithms because of their ease of implementation and good…

Optimization and Control · Mathematics 2022-03-22 J. G. Dai , Mark Gluzman

Multi-Objective Policy Gradients with Topological Constraints

Multi-objective optimization models that encode ordered sequential constraints provide a solution to model various challenging problems including encoding preferences, modeling a curriculum, and enforcing measures of safety. A recently…

Artificial Intelligence · Computer Science 2022-09-16 Kyle Hollins Wray , Stas Tiomkin , Mykel J. Kochenderfer , Pieter Abbeel

A reinforcement learning approach to hybrid control design

In this paper we design hybrid control policies for hybrid systems whose mathematical models are unknown. Our contributions are threefold. First, we propose a framework for modelling the hybrid control design problem as a single Markov…

Systems and Control · Electrical Eng. & Systems 2020-09-03 Meet Gandhi , Atreyee Kundu , Shalabh Bhatnagar

Attention-Enhanced Prioritized Proximal Policy Optimization for Adaptive Edge Caching

This paper tackles the growing issue of excessive data transmission in networks. With increasing traffic, backhaul links and core networks are under significant traffic, leading to the investigation of caching solutions at edge routers.…

Networking and Internet Architecture · Computer Science 2024-10-31 Farnaz Niknia , Ping Wang , Zixu Wang , Aakash Agarwal , Adib S. Rezaei

A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes

The proximal policy optimization (PPO) algorithm stands as one of the most prosperous methods in the field of reinforcement learning (RL). Despite its success, the theoretical understanding of PPO remains deficient. Specifically, it is…

Machine Learning · Computer Science 2023-06-09 Han Zhong , Tong Zhang

Lyapunov-based Safe Policy Optimization for Continuous Control

We study continuous action reinforcement learning problems in which it is crucial that the agent interacts with the environment only through safe policies, i.e.,~policies that do not take the agent to undesirable situations. We formulate…

Machine Learning · Computer Science 2019-02-13 Yinlam Chow , Ofir Nachum , Aleksandra Faust , Edgar Duenez-Guzman , Mohammad Ghavamzadeh

Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes

To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice. Contrary to traditional RL algorithms…

Machine Learning · Computer Science 2021-08-24 Arghyadip Roy , Vivek Borkar , Abhay Karandikar , Prasanna Chaporkar

A Structure-aware Online Learning Algorithm for Markov Decision Processes

To overcome the curse of dimensionality and curse of modeling in Dynamic Programming (DP) methods for solving classical Markov Decision Process (MDP) problems, Reinforcement Learning (RL) algorithms are popular. In this paper, we consider…

Machine Learning · Computer Science 2018-11-29 Arghyadip Roy , Vivek Borkar , Abhay Karandikar , Prasanna Chaporkar

Multi-Objective Reward and Preference Optimization: Theory and Algorithms

This thesis develops theoretical frameworks and algorithms that advance constrained reinforcement learning (RL) across control, preference learning, and alignment of large language models. The first contribution addresses constrained Markov…

Machine Learning · Computer Science 2025-12-12 Akhil Agnihotri

The association problem in wireless networks: a Policy Gradient Reinforcement Learning approach

The purpose of this paper is to develop a self-optimized association algorithm based on PGRL (Policy Gradient Reinforcement Learning), which is both scalable, stable and robust. The term robust means that performance degradation in the…

Networking and Internet Architecture · Computer Science 2013-06-12 Richard Combes , Ilham El Bouloumi , Stephane Senecal , Zwi Altman

Operator-Theoretic Foundations and Policy Gradient Methods for General MDPs with Unbounded Costs

Markov decision processes (MDPs) is viewed as an optimization of an objective function over certain linear operators over general function spaces. A new existence result is established for the existence of optimal policies in general MDPs,…

Machine Learning · Computer Science 2026-04-01 Abhishek Gupta , Aditya Mahajan

Convergence of Natural Policy Gradient for a Family of Infinite-State Queueing MDPs

A wide variety of queueing systems can be naturally modeled as infinite-state Markov Decision Processes (MDPs). In the reinforcement learning (RL) context, a variety of algorithms have been developed to learn and optimize these MDPs. At the…

Machine Learning · Computer Science 2025-07-14 Isaac Grosof , Siva Theja Maguluri , R. Srikant

Proximal Policy Optimization with Mixed Distributed Training

Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization (PPO) is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on…

Machine Learning · Computer Science 2019-10-01 Zhenyu Zhang , Xiangfeng Luo , Tong Liu , Shaorong Xie , Jianshu Wang , Wei Wang , Yang Li , Yan Peng

Empirical Evaluation of Policy-Based Reinforcement Learning for Dynamic Service Control in an M/M/1 Queue

While reinforcement learning has been increasingly applied to stochastic control, few studies have systematically examined policy-based methods in queuing environments modeled as a semi-Markov decision process (SMDP). To address this gap,…

Optimization and Control · Mathematics 2026-04-28 Joseph Walton , Gabriel Nicolosi

Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation

Decision-making under distribution shift is a central challenge in reinforcement learning (RL), where training and deployment environments differ. We study this problem through the lens of robust Markov decision processes (RMDPs), which…

Machine Learning · Computer Science 2025-10-17 Jingwen Gu , Yiting He , Zhishuai Liu , Pan Xu

Training Efficient Controllers via Analytic Policy Gradient

Control design for robotic systems is complex and often requires solving an optimization to follow a trajectory accurately. Online optimization approaches like Model Predictive Control (MPC) have been shown to achieve great tracking…

Robotics · Computer Science 2023-05-04 Nina Wiedemann , Valentin Wüest , Antonio Loquercio , Matthias Müller , Dario Floreano , Davide Scaramuzza

Learning to Play Pong using Policy Gradient Learning

Activities in reinforcement learning (RL) revolve around learning the Markov decision process (MDP) model, in particular, the following parameters: state values, V; state-action values, Q; and policy, pi. These parameters are commonly…

Machine Learning · Computer Science 2018-07-24 Somnuk Phon-Amnuaisuk

Reduced Policy Optimization for Continuous Control with Hard Constraints

Recent advances in constrained reinforcement learning (RL) have endowed reinforcement learning with certain safety guarantees. However, deploying existing constrained RL algorithms in continuous control tasks with general hard constraints…

Machine Learning · Computer Science 2023-12-22 Shutong Ding , Jingya Wang , Yali Du , Ye Shi

Adversarial Policy Optimization in Deep Reinforcement Learning

The policy represented by the deep neural network can overfit the spurious features in observations, which hamper a reinforcement learning agent from learning effective policy. This issue becomes severe in high-dimensional state, where the…

Machine Learning · Computer Science 2023-05-01 Md Masudur Rahman , Yexiang Xue

Reinforcement Learning of Markov Decision Processes with Peak Constraints

In this paper, we consider reinforcement learning of Markov Decision Processes (MDP) with peak constraints, where an agent chooses a policy to optimize an objective and at the same time satisfy additional constraints. The agent has to take…

Optimization and Control · Mathematics 2019-12-09 Ather Gattami