Related papers: Amortized Proximal Optimization

AM-PPO: (Advantage) Alpha-Modulation with Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a widely used reinforcement learning algorithm that heavily relies on accurate advantage estimates for stable and efficient training. However, raw advantage signals can exhibit significant variance,…

Machine Learning · Computer Science 2025-05-22 Soham Sane

Proximal Policy Optimization Algorithms

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent.…

Machine Learning · Computer Science 2017-08-29 John Schulman , Filip Wolski , Prafulla Dhariwal , Alec Radford , Oleg Klimov

An Adaptive Clipping Approach for Proximal Policy Optimization

Very recently proximal policy optimization (PPO) algorithms have been proposed as first-order optimization methods for effective reinforcement learning. While PPO is inspired by the same learning theory that justifies trust region policy…

Machine Learning · Computer Science 2018-04-20 Gang Chen , Yiming Peng , Mengjie Zhang

Artificial Protozoa Optimizer (APO): A novel bio-inspired metaheuristic algorithm for engineering optimization

This study proposes a novel artificial protozoa optimizer (APO) that is inspired by protozoa in nature. The APO mimics the survival mechanisms of protozoa by simulating their foraging, dormancy, and reproductive behaviors. The APO was…

Neural and Evolutionary Computing · Computer Science 2025-05-07 Xiaopeng Wang , Vaclav Snasel , Seyedali Mirjalili , Jeng-Shyang Pan , Lingping Kong , Hisham A. Shehadeh

Beyond the Boundaries of Proximal Policy Optimization

Proximal policy optimization (PPO) is a widely-used algorithm for on-policy reinforcement learning. This work offers an alternative perspective of PPO, in which it is decomposed into the inner-loop estimation of update vectors, and the…

Machine Learning · Computer Science 2024-11-04 Charlie B. Tan , Edan Toledo , Benjamin Ellis , Jakob N. Foerster , Ferenc Huszár

Transductive Off-policy Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular model-free reinforcement learning algorithm, esteemed for its simplicity and efficacy. However, due to its inherent on-policy nature, its proficiency in harnessing data from disparate policies…

Machine Learning · Computer Science 2024-06-07 Yaozhong Gan , Renye Yan , Xiaoyang Tan , Zhe Wu , Junliang Xing

Truly Proximal Policy Optimization

Proximal policy optimization (PPO) is one of the most successful deep reinforcement-learning methods, achieving state-of-the-art performance across a wide range of challenging tasks. However, its optimization behavior is still far from…

Machine Learning · Computer Science 2020-01-15 Yuhui Wang , Hao He , Chao Wen , Xiaoyang Tan

A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization

We describe a framework for deriving and analyzing online optimization algorithms that incorporate adaptive, data-dependent regularization, also termed preconditioning. Such algorithms have been proven useful in stochastic optimization by…

Machine Learning · Computer Science 2017-06-21 Vineet Gupta , Tomer Koren , Yoram Singer

Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws

The composition of pretraining data is a key determinant of foundation models' performance, but there is no standard guideline for allocating a limited computational budget across different data sources. Most current approaches either rely…

Machine Learning · Computer Science 2024-10-16 Yiding Jiang , Allan Zhou , Zhili Feng , Sadhika Malladi , J. Zico Kolter

Proximal Policy Optimization with Adaptive Exploration

Proximal Policy Optimization with Adaptive Exploration (axPPO) is introduced as a novel learning algorithm. This paper investigates the exploration-exploitation tradeoff within the context of reinforcement learning and aims to contribute…

Machine Learning · Computer Science 2024-05-09 Andrei Lixandru

Adversarial Policy Optimization in Deep Reinforcement Learning

The policy represented by the deep neural network can overfit the spurious features in observations, which hamper a reinforcement learning agent from learning effective policy. This issue becomes severe in high-dimensional state, where the…

Machine Learning · Computer Science 2023-05-01 Md Masudur Rahman , Yexiang Xue

Adaptive Composite Online Optimization: Predictions in Static and Dynamic Environments

In the past few years, Online Convex Optimization (OCO) has received notable attention in the control literature thanks to its flexible real-time nature and powerful performance guarantees. In this paper, we propose new step-size rules and…

Optimization and Control · Mathematics 2023-01-18 Pedro Zattoni Scroccaro , Arman Sharifi Kolarijani , Peyman Mohajerin Esfahani

APO: Alpha-Divergence Preference Optimization

Two divergence regimes dominate modern alignment practice. Supervised fine-tuning and many distillation-style objectives implicitly minimize the forward KL divergence KL(q || pi_theta), yielding stable mode-covering updates but often…

Machine Learning · Computer Science 2025-12-30 Wang Zixian

Proximal Policy Optimization Smoothed Algorithm

Proximal policy optimization (PPO) has yielded state-of-the-art results in policy search, a subfield of reinforcement learning, with one of its key points being the use of a surrogate objective function to restrict the step size at each…

Machine Learning · Computer Science 2020-12-07 Wangshu Zhu , Andre Rosendo

Model-free Optical Processors using In Situ Reinforcement Learning with Proximal Policy Optimization

Optical computing holds promise for high-speed, energy-efficient information processing, with diffractive optical networks emerging as a flexible platform for implementing task-specific transformations. A challenge, however, is the…

Machine Learning · Computer Science 2026-01-05 Yuhang Li , Shiqi Chen , Tingyu Gong , Aydogan Ozcan

Decaying Clipping Range in Proximal Policy Optimization

Proximal Policy Optimization (PPO) is among the most widely used algorithms in reinforcement learning, which achieves state-of-the-art performance in many challenging problems. The keys to its success are the reliable policy updates through…

Machine Learning · Computer Science 2021-07-02 Mónika Farsang , Luca Szegletes

Tutorial on amortized optimization

Optimization is a ubiquitous modeling tool and is often deployed in settings which repeatedly solve similar instances of the same problem. Amortized optimization methods use learning to predict the solutions to problems in these settings,…

Machine Learning · Computer Science 2025-10-07 Brandon Amos

Revisiting Design Choices in Proximal Policy Optimization

Proximal Policy Optimization (PPO) is a popular deep policy gradient algorithm. In standard implementations, PPO regularizes policy updates with clipped probability ratios, and parameterizes policies with either continuous Gaussian…

Machine Learning · Computer Science 2020-09-24 Chloe Ching-Yun Hsu , Celestine Mendler-Dünner , Moritz Hardt

Proximal Algorithms for Smoothed Online Convex Optimization with Predictions

We consider a smoothed online convex optimization (SOCO) problem with predictions, where the learner has access to a finite lookahead window of time-varying stage costs, but suffers a switching cost for changing its actions at each stage.…

Optimization and Control · Mathematics 2023-10-16 Spandan Senapati , Ashwin Shenai , Ketan Rajawat

Adversarial Constrained Policy Optimization: Improving Constrained Reinforcement Learning by Adapting Budgets

Constrained reinforcement learning has achieved promising progress in safety-critical fields where both rewards and constraints are considered. However, constrained reinforcement learning methods face challenges in striking the right…

Machine Learning · Computer Science 2024-10-29 Jianmina Ma , Jingtian Ji , Yue Gao