Related papers: DYNAMIX: RL-based Adaptive Batch Size Optimization…

A Deep Reinforcement Learning Approach to Efficient Distributed Optimization

In distributed optimization, the practical problem-solving performance is essentially sensitive to algorithm selection, parameter setting, problem type and data pattern. Thus, it is often laborious to acquire a highly efficient method for a…

Optimization and Control · Mathematics 2024-01-04 Daokuan Zhu , Tianqi Xu , Jie Lu

Batch-Constrained Reinforcement Learning for Dynamic Distribution Network Reconfiguration

Dynamic distribution network reconfiguration (DNR) algorithms perform hourly status changes of remotely controllable switches to improve distribution system performance. The problem is typically solved by physical model-based control…

Systems and Control · Electrical Eng. & Systems 2020-06-24 Yuanqi Gao , Wei Wang , Jie Shi , Nanpeng Yu

State Regularized Policy Optimization on Data with Dynamics Shift

In many real-world scenarios, Reinforcement Learning (RL) algorithms are trained on data with dynamics shift, i.e., with different underlying environment dynamics. A majority of current methods address such issue by training context…

Machine Learning · Computer Science 2024-02-23 Zhenghai Xue , Qingpeng Cai , Shuchang Liu , Dong Zheng , Peng Jiang , Kun Gai , Bo An

Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces

Reinforcement learning (RL) struggles to scale to large, combinatorial action spaces common in many real-world problems. This paper introduces a novel framework for training discrete diffusion models as highly effective policies in these…

Machine Learning · Computer Science 2026-05-21 Haitong Ma , Ofir Nabati , Aviv Rosenberg , Bo Dai , Oran Lang , Craig Boutilier , Na Li , Shie Mannor , Lior Shani , Guy Tenneholtz

Policy Regularized Distributionally Robust Markov Decision Processes with Linear Function Approximation

Decision-making under distribution shift is a central challenge in reinforcement learning (RL), where training and deployment environments differ. We study this problem through the lens of robust Markov decision processes (RMDPs), which…

Machine Learning · Computer Science 2025-10-17 Jingwen Gu , Yiting He , Zhishuai Liu , Pan Xu

Proximal Policy Optimization with Mixed Distributed Training

Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization (PPO) is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on…

Machine Learning · Computer Science 2019-10-01 Zhenyu Zhang , Xiangfeng Luo , Tong Liu , Shaorong Xie , Jianshu Wang , Wei Wang , Yang Li , Yan Peng

Distributional constrained reinforcement learning for supply chain optimization

This work studies reinforcement learning (RL) in the context of multi-period supply chains subject to constraints, e.g., on production and inventory. We introduce Distributional Constrained Policy Optimization (DCPO), a novel approach for…

Machine Learning · Computer Science 2023-02-06 Jaime Sabal Bermúdez , Antonio del Rio Chanona , Calvin Tsay

Designing Adaptive Algorithms Based on Reinforcement Learning for Dynamic Optimization of Sliding Window Size in Multi-Dimensional Data Streams

Multi-dimensional data streams, prevalent in applications like IoT, financial markets, and real-time analytics, pose significant challenges due to their high velocity, unbounded nature, and complex inter-dimensional dependencies. Sliding…

Machine Learning · Computer Science 2025-07-10 Abolfazl Zarghani , Sadegh Abedi

Robustness and risk management via distributional dynamic programming

In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act optimally in terms of expected long-term return by sequentially interacting with its environment modeled by a Markov decision process (MDP). More generally…

Machine Learning · Computer Science 2022-01-03 Mastane Achab , Gergely Neu

Reconciling Discrete-Time Mixed Policies and Continuous-Time Relaxed Controls in Reinforcement Learning and Stochastic Control

Reinforcement learning (RL) is currently one of the most prominent methods for optimizing dynamical systems, with breakthrough results across various fields. The framework is based on the concept of a Markov decision process (MDP), leading…

Optimization and Control · Mathematics 2025-11-17 Rene Carmona , Mathieu Lauriere

Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning

Increasingly large imitation learning datasets are being collected with the goal of training foundation models for robotics. However, despite the fact that data selection has been of utmost importance in vision and natural language…

Robotics · Computer Science 2025-02-24 Joey Hejna , Chethan Bhateja , Yichen Jiang , Karl Pertsch , Dorsa Sadigh

A Reinforcement Learning Method for Environments with Stochastic Variables: Post-Decision Proximal Policy Optimization with Dual Critic Networks

This paper presents Post-Decision Proximal Policy Optimization (PDPPO), a novel variation of the leading deep reinforcement learning method, Proximal Policy Optimization (PPO). The PDPPO state transition process is divided into two steps: a…

Machine Learning · Computer Science 2025-11-18 Leonardo Kanashiro Felizardo , Edoardo Fadda , Paolo Brandimarte , Emilio Del-Moral-Hernandez , Mariá Cristina Vasconcelos Nascimento

Data optimization for large batch distributed training of deep neural networks

Distributed training in deep learning (DL) is common practice as data and models grow. The current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale, and…

Machine Learning · Computer Science 2020-12-21 Shubhankar Gahlot , Junqi Yin , Mallikarjun Shankar

Dynamic Optimization of Storage Systems Using Reinforcement Learning Techniques

The exponential growth of data-intensive applications has placed unprecedented demands on modern storage systems, necessitating dynamic and efficient optimization strategies. Traditional heuristics employed for storage performance…

Operating Systems · Computer Science 2025-08-25 Chiyu Cheng , Chang Zhou , Yang Zhao

Distributed Direct Preference Optimization

Preference-based reinforcement learning (RL) is a key paradigm for aligning policies with human judgments, yet its theoretical behavior in distributed settings where preference data are fragmented across heterogeneous users remains poorly…

Machine Learning · Computer Science 2026-05-21 Zhanhong Jiang

DYNAMITE: Dynamic Interplay of Mini-Batch Size and Aggregation Frequency for Federated Learning with Static and Streaming Dataset

Federated Learning (FL) is a distributed learning paradigm that can coordinate heterogeneous edge devices to perform model training without sharing private data. While prior works have focused on analyzing FL convergence with respect to…

Machine Learning · Computer Science 2025-09-09 Weijie Liu , Xiaoxi Zhang , Jingpu Duan , Carlee Joe-Wong , Zhi Zhou , Xu Chen

DBS: Dynamic Batch Size For Distributed Deep Neural Network Training

Synchronous strategies with data parallelism, such as the Synchronous StochasticGradient Descent (S-SGD) and the model averaging methods, are widely utilizedin distributed training of Deep Neural Networks (DNNs), largely owing to itseasy…

Machine Learning · Computer Science 2022-11-04 Qing Ye , Yuhao Zhou , Mingjia Shi , Yanan Sun , Jiancheng Lv

Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

Policy-based algorithms are among the most widely adopted techniques in model-free RL, thanks to their strong theoretical groundings and good properties in continuous action spaces. Unfortunately, these methods require precise and…

Machine Learning · Computer Science 2023-06-14 Luca Sabbioni , Francesco Corda , Marcello Restelli

Dynamically Optimal Treatment Allocation

Dynamic decisions are pivotal to economic policy making. We show how existing evidence from randomized control trials can be utilized to guide personalized decisions in challenging dynamic environments with budget and capacity constraints.…

Econometrics · Economics 2024-11-26 Karun Adusumilli , Friedrich Geiecke , Claudio Schilter

Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning

Distributionally robust offline reinforcement learning (RL), which seeks robust policy training against environment perturbation by modeling dynamics uncertainty, calls for function approximations when facing large state-action spaces.…

Machine Learning · Computer Science 2025-11-03 Zhishuai Liu , Pan Xu