Related papers: Policy Gradient-based Algorithms for Continuous-ti…
We consider the Linear-Quadratic-Regulator (LQR) problem in terms of optimizing a real-valued matrix function over the set of feedback gains. Such a setup facilitates examining the implications of a natural initial-state independent…
We explore reinforcement learning methods for finding the optimal policy in the linear quadratic regulator (LQR) problem. In particular, we consider the convergence of policy gradient methods in the setting of known and unknown parameters.…
Policy gradient algorithms are widely used in reinforcement learning and belong to the class of approximate dynamic programming methods. This paper studies two key policy gradient algorithms, the Natural Policy Gradient and the Gauss-Newton…
We consider policy gradient algorithms for the indefinite least squares stationary optimal control, e.g., linear-quadratic-regulator (LQR) with indefinite state and input penalization matrices. Such a setup has important applications in…
A gradient-based method is proposed for solving the linear quadratic regulator (LQR) problem for linear systems with nonlinear dependence on time-invariant probabilistic parametric uncertainties. The approach explicitly accounts for model…
Policy gradient methods are a powerful family of reinforcement learning algorithms for continuous control that optimize a policy directly. However, standard first-order methods often converge slowly. Second-order methods can accelerate…
Despite its nonconvexity, policy optimization for the Linear Quadratic Regulator (LQR) admits a favorable structural property known as gradient dominance, which facilitates linear convergence of policy gradient methods to the globally…
In this work we study the convergence of gradient methods for nonconvex optimization problems -- specifically the effect of the problem formulation to the convergence behavior of the solution of a gradient flow. We show through a simple…
The Linear Quadratic Regulator (LQR) is a cornerstone of optimal control theory, widely studied in both model-based and model-free approaches. Despite its well-established nature, certain foundational aspects remain subtle. In this paper,…
Consider a discrete-time Linear Quadratic Regulator (LQR) problem solved using policy gradient descent when the system matrices are unknown. The gradient is transmitted across a noisy channel over a finite time horizon using analog…
A classical approach for solving discrete time nonlinear control on a finite horizon consists in repeatedly minimizing linear quadratic approximations of the original problem around current candidate solutions. While widely popular in many…
Motivated by recent advances of reinforcement learning and direct data-driven control, we propose policy gradient adaptive control (PGAC) for the linear quadratic regulator (LQR), which uses online closed-loop data to improve the control…
Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model 2) they are an…
Motivated by the growing use of artificial intelligence (AI) tools in control design, this paper analyses the intersection between results from gradient methods for the model-free linear quadratic regulator (LQR), and linear feedforward…
With the outstanding performance of policy gradient (PG) method in the reinforcement learning field, the convergence theory of it has aroused more and more interest recently. Meanwhile, the significant importance and abundant theoretical…
The convergence of policy gradient algorithms hinges on the optimization landscape of the underlying optimal control problem. Theoretical insights into these algorithms can often be acquired from analyzing those of linear quadratic control.…
Stability is one of the most fundamental requirements for systems synthesis. In this paper, we address the stabilization problem for unknown linear systems via policy gradient (PG) methods. We leverage a key feature of PG for Linear…
Nonlinear control systems with partial information to the decision maker are prevalent in a variety of applications. As a step toward studying such nonlinear systems, this work explores reinforcement learning methods for finding the optimal…
Flow $Q$-learning has recently been introduced to integrate learning from expert demonstrations into an actor-critic structure. Central to this innovation is the ``the one-step policy'' network, which is optimized through a $Q$-function…
Policy gradient (PG) methods are the backbone of many reinforcement learning algorithms due to their good performance in policy optimization problems. As a gradient-based approach, PG methods typically rely on knowledge of the system…