Related papers: Stabilizing Reinforcement Learning in Differentiab…

A Differential and Pointwise Control Approach to Reinforcement Learning

Reinforcement learning (RL) in continuous state-action spaces remains challenging in scientific computing due to poor sample efficiency and lack of pathwise physical consistency. We introduce Differential Reinforcement Learning…

Machine Learning · Computer Science 2026-02-06 Minh Nguyen , Chandrajit Bajaj

Rectified Robust Policy Optimization for Model-Uncertain Constrained Reinforcement Learning without Strong Duality

The goal of robust constrained reinforcement learning (RL) is to optimize an agent's performance under the worst-case model uncertainty while satisfying safety or resource constraints. In this paper, we demonstrate that strong duality does…

Machine Learning · Computer Science 2025-09-23 Shaocong Ma , Ziyi Chen , Yi Zhou , Heng Huang

Accelerated Policy Learning with Parallel Differentiable Simulation

Deep reinforcement learning can generate complex control policies, but requires large amounts of training data to work effectively. Recent work has attempted to address this issue by leveraging differentiable simulators. However, inherent…

Machine Learning · Computer Science 2022-04-15 Jie Xu , Viktor Makoviychuk , Yashraj Narang , Fabio Ramos , Wojciech Matusik , Animesh Garg , Miles Macklin

DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

Deep reinforcement learning (RL) has achieved remarkable success, yet its deployment in real-world scenarios is often limited by vulnerability to environmental uncertainties. Distributionally robust RL (DR-RL) algorithms have been proposed…

Machine Learning · Computer Science 2026-04-21 Mingxuan Cui , Duo Zhou , Yuxuan Han , Grani A. Hanasusanto , Qiong Wang , Huan Zhang , Zhengyuan Zhou

Soft Adaptive Policy Optimization

Reinforcement learning (RL) plays an increasingly important role in enhancing the reasoning capabilities of large language models (LLMs), yet stable and performant policy optimization remains challenging. Token-level importance ratios often…

Machine Learning · Computer Science 2025-12-02 Chang Gao , Chujie Zheng , Xiong-Hui Chen , Kai Dang , Shixuan Liu , Bowen Yu , An Yang , Shuai Bai , Jingren Zhou , Junyang Lin

Real-Time Reinforcement Learning for Dynamic Tasks with a Parallel Soft Robot

Closed-loop control remains an open challenge in soft robotics. The nonlinear responses of soft actuators under dynamic loading conditions limit the use of analytic models for soft robot control. Traditional methods of controlling soft…

Robotics · Computer Science 2025-09-25 James Avtges , Jake Ketchum , Millicent Schlafly , Helena Young , Taekyoung Kim , Allison Pinosky , Ryan L. Truby , Todd D. Murphey

Survival of the Fittest: Evolutionary Adaptation of Policies for Environmental Shifts

Reinforcement learning (RL) has been successfully applied to solve the problem of finding obstacle-free paths for autonomous agents operating in stochastic and uncertain environments. However, when the underlying stochastic dynamics of the…

Machine Learning · Computer Science 2024-10-29 Sheryl Paul , Jyotirmoy V. Deshmukh

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and…

Machine Learning · Computer Science 2018-08-10 Tuomas Haarnoja , Aurick Zhou , Pieter Abbeel , Sergey Levine

Deep Reinforcement Learning with Robust and Smooth Policy

Deep reinforcement learning (RL) has achieved great empirical successes in various domains. However, the large search space of neural networks requires a large amount of data, which makes the current RL algorithms not sample efficient.…

Machine Learning · Computer Science 2020-08-18 Qianli Shen , Yan Li , Haoming Jiang , Zhaoran Wang , Tuo Zhao

State Regularized Policy Optimization on Data with Dynamics Shift

In many real-world scenarios, Reinforcement Learning (RL) algorithms are trained on data with dynamics shift, i.e., with different underlying environment dynamics. A majority of current methods address such issue by training context…

Machine Learning · Computer Science 2024-02-23 Zhenghai Xue , Qingpeng Cai , Shuchang Liu , Dong Zheng , Peng Jiang , Kun Gai , Bo An

Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

Deep reinforcement learning algorithms can perform poorly in real-world tasks due to the discrepancy between source and target environments. This discrepancy is commonly viewed as the disturbance in transition dynamics. Many existing…

Machine Learning · Computer Science 2021-12-21 Yufei Kuang , Miao Lu , Jie Wang , Qi Zhou , Bin Li , Houqiang Li

Constrained Reinforcement Learning Under Model Mismatch

Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied…

Machine Learning · Computer Science 2024-05-06 Zhongchang Sun , Sihong He , Fei Miao , Shaofeng Zou

Soft Actor-Critic Algorithms and Applications

Model-free deep reinforcement learning (RL) algorithms have been successfully applied to a range of challenging sequential decision making and control tasks. However, these methods typically suffer from two major challenges: high sample…

Machine Learning · Computer Science 2019-09-16 Tuomas Haarnoja , Aurick Zhou , Kristian Hartikainen , George Tucker , Sehoon Ha , Jie Tan , Vikash Kumar , Henry Zhu , Abhishek Gupta , Pieter Abbeel , Sergey Levine

Robust Policy Optimization in Deep Reinforcement Learning

The policy gradient method enjoys the simplicity of the objective where the agent optimizes the cumulative reward directly. Moreover, in the continuous action domain, parameterized distribution of action distribution allows easy control of…

Machine Learning · Computer Science 2022-12-16 Md Masudur Rahman , Yexiang Xue

Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion

Deep reinforcement learning (RL) uses model-free techniques to optimize task-specific control policies. Despite having emerged as a promising approach for complex problems, RL is still hard to use reliably for real-world applications. Apart…

Robotics · Computer Science 2020-02-25 Siddhant Gangapurwala , Alexander Mitchell , Ioannis Havoutis

GEPO: Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning

As single-center computing approaches power constraints, decentralized training becomes essential. However, traditional Reinforcement Learning (RL) methods, crucial for enhancing large model post-training, cannot adapt to decentralized…

Machine Learning · Computer Science 2026-01-30 Han Zhang , Ruibin Zheng , Zexuan Yi , Zhuo Zhang , Hanyang Peng , Hui Wang , Zike Yuan , Cai Ke , Shiwei Chen , Jiacheng Yang , Yangning Li , Xiang Li , Jiangyue Yan , Yaoqi Liu , Liwen Jing , Jiayin Qi , Ruifeng Xu , Binxing Fang , Yue Yu

Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain

Reinforcement learning (RL) has enabled robust quadruped locomotion over complex terrain, but most learned controllers are trained offline with backpropagation in massively parallel simulation and deployed as fixed policies, limiting…

Neural and Evolutionary Computing · Computer Science 2026-05-12 Zhuangyu Han , Abhronil Sengupta

Reduced Policy Optimization for Continuous Control with Hard Constraints

Recent advances in constrained reinforcement learning (RL) have endowed reinforcement learning with certain safety guarantees. However, deploying existing constrained RL algorithms in continuous control tasks with general hard constraints…

Machine Learning · Computer Science 2023-12-22 Shutong Ding , Jingya Wang , Yali Du , Ye Shi

Safe Reinforcement Learning for Autonomous Vehicles through Parallel Constrained Policy Optimization

Reinforcement learning (RL) is attracting increasing interests in autonomous driving due to its potential to solve complex classification and control problems. However, existing RL algorithms are rarely applied to real vehicles for two…

Machine Learning · Computer Science 2020-03-04 Lu Wen , Jingliang Duan , Shengbo Eben Li , Shaobing Xu , Huei Peng

Learning Quadruped Locomotion Using Differentiable Simulation

This work explores the potential of using differentiable simulation for learning quadruped locomotion. Differentiable simulation promises fast convergence and stable training by computing low-variance first-order gradients using robot…

Robotics · Computer Science 2024-10-16 Yunlong Song , Sangbae Kim , Davide Scaramuzza