Related papers: Efficient iterative policy optimization

Reward Constrained Policy Optimization

Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to maximize the accumulated reward, it often learns to exploit loopholes and misspecifications in the reward signal resulting in unwanted behavior. While…

Machine Learning · Computer Science 2018-12-27 Chen Tessler , Daniel J. Mankowitz , Shie Mannor

Projection-Based Constrained Policy Optimization

We consider the problem of learning control policies that optimize a reward function while satisfying constraints due to considerations of safety, fairness, or other costs. We propose a new algorithm, Projection-Based Constrained Policy…

Machine Learning · Computer Science 2020-10-08 Tsung-Yen Yang , Justinian Rosca , Karthik Narasimhan , Peter J. Ramadge

Easy Monotonic Policy Iteration

A key problem in reinforcement learning for control with general function approximators (such as deep neural networks and other nonlinear functions) is that, for many algorithms employed in practice, updates to the policy or $Q$-function…

Machine Learning · Computer Science 2016-03-01 Joshua Achiam

Cost-Effective Incentive Allocation via Structured Counterfactual Inference

We address a practical problem ubiquitous in modern marketing campaigns, in which a central agent tries to learn a policy for allocating strategic financial incentives to customers and observes only bandit feedback. In contrast to…

Machine Learning · Statistics 2019-11-12 Romain Lopez , Chenchen Li , Xiang Yan , Junwu Xiong , Michael I. Jordan , Yuan Qi , Le Song

Off-Policy Optimization of Portfolio Allocation Policies under Constraints

The dynamic portfolio optimization problem in finance frequently requires learning policies that adhere to various constraints, driven by investor preferences and risk. We motivate this problem of finding an allocation policy within a…

Artificial Intelligence · Computer Science 2020-12-23 Nymisha Bandi , Theja Tulabandhula

Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees

Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the…

Machine Learning · Computer Science 2026-02-10 Sourav Ganguly , Kishan Panaganti , Arnob Ghosh , Adam Wierman

Efficient Sampling Policy for Selecting a Good Enough Subset

The note studies the problem of selecting a good enough subset out of a finite number of alternatives under a fixed simulation budget. Our work aims to maximize the posterior probability of correctly selecting a good subset. We formulate…

Optimization and Control · Mathematics 2023-05-09 Gongbo Zhang , Bin Chen , Qing-shan Jia , Yijie Peng

Beyond Worst-case: A Probabilistic Analysis of Affine Policies in Dynamic Optimization

Affine policies (or control) are widely used as a solution approach in dynamic optimization where computing an optimal adjustable solution is usually intractable. While the worst case performance of affine policies can be significantly bad,…

Optimization and Control · Mathematics 2019-10-15 Omar El Housni , Vineet Goyal

Accelerating Safe Reinforcement Learning with Constraint-mismatched Policies

We consider the problem of reinforcement learning when provided with (1) a baseline control policy and (2) a set of constraints that the learner must satisfy. The baseline policy can arise from demonstration data or a teacher agent and may…

Machine Learning · Computer Science 2021-07-13 Tsung-Yen Yang , Justinian Rosca , Karthik Narasimhan , Peter J. Ramadge

Learning from Scarce Experience

Searching the space of policies directly for the optimal policy has been one popular method for solving partially observable reinforcement learning problems. Typically, with each change of the target policy, its value is estimated from the…

Artificial Intelligence · Computer Science 2007-05-23 Leonid Peshkin , Christian R. Shelton

Policy Learning for Balancing Short-Term and Long-Term Rewards

Empirical researchers and decision-makers spanning various domains frequently seek profound insights into the long-term impacts of interventions. While the significance of long-term outcomes is undeniable, an overemphasis on them may…

Machine Learning · Computer Science 2024-09-17 Peng Wu , Ziyu Shen , Feng Xie , Zhongyao Wang , Chunchen Liu , Yan Zeng

Constrained Density Functional Theory Calculation with Iterative Optimization

An iterative optimization approach that simultaneously minimizes the energy and optimizes the Lagrange multipliers enforcing desired constraints is presented. The method is tested on previously established benchmark systems and it is proved…

Computational Physics · Physics 2018-08-15 D. Kidd , A. S. Umar , K. Varga

Policy Optimization Through Approximate Importance Sampling

Recent policy optimization approaches (Schulman et al., 2015a; 2017) have achieved substantial empirical successes by constructing new proxy optimization objectives. These proxy objectives allow stable and low variance policy learning, but…

Machine Learning · Computer Science 2020-02-24 Marcin B. Tomczak , Dongho Kim , Peter Vrancx , Kee-Eung Kim

Local Policy Improvement for Recommender Systems

Recommender systems predict what items a user will interact with next, based on their past interactions. The problem is often approached through supervised learning, but recent advancements have shifted towards policy optimization of…

Machine Learning · Computer Science 2023-04-28 Dawen Liang , Nikos Vlassis

Doubly Optimal Policy Evaluation for Reinforcement Learning

Policy evaluation estimates the performance of a policy by (1) collecting data from the environment and (2) processing raw data into a meaningful estimate. Due to the sequential nature of reinforcement learning, any improper data-collecting…

Machine Learning · Computer Science 2025-03-21 Shuze Daniel Liu , Claire Chen , Shangtong Zhang

Optimistic Value Iteration

Markov decision processes are widely used for planning and verification in settings that combine controllable or adversarial choices with probabilistic behaviour. The standard analysis algorithm, value iteration, only provides a lower bound…

Logic in Computer Science · Computer Science 2019-10-21 Arnd Hartmanns , Benjamin Lucien Kaminski

Policy Gradient Algorithms Implicitly Optimize by Continuation

Direct policy optimization in reinforcement learning is usually solved with policy-gradient algorithms, which optimize policy parameters via stochastic gradient ascent. This paper provides a new theoretical interpretation and justification…

Machine Learning · Computer Science 2023-10-24 Adrien Bolland , Gilles Louppe , Damien Ernst

On convex problems in chance-constrained stochastic model predictive control

We investigate constrained optimal control problems for linear stochastic dynamical systems evolving in discrete time. We consider minimization of an expected value cost over a finite horizon. Hard constraints are introduced first, and then…

Optimization and Control · Mathematics 2011-07-07 Eugenio Cinquemani , Mayank Agarwal , Debasish Chatterjee , John Lygeros

Off-Policy Interval Estimation with Lipschitz Value Iteration

Off-policy evaluation provides an essential tool for evaluating the effects of different policies or treatments using only observed data. When applied to high-stakes scenarios such as medical diagnosis or financial decision-making, it is…

Machine Learning · Computer Science 2020-10-30 Ziyang Tang , Yihao Feng , Na Zhang , Jian Peng , Qiang Liu

Simple Policy Evaluation for Data-Rich Iterative Tasks

A data-based policy for iterative control task is presented. The proposed strategy is model-free and can be applied whenever safe input and state trajectories of a system performing an iterative task are available. These trajectories,…

Systems and Control · Computer Science 2019-03-22 Ugo Rosolia , Xiaojing Zhang , Francesco Borrelli