Related papers: Proximal Algorithms and Temporal Differences for L…
In this paper, we introduce proximal gradient temporal difference learning, which provides a principled way of designing and analyzing true stochastic gradient temporal difference learning algorithms. We show how gradient TD (GTD)…
Temporal difference (TD) learning is a foundational algorithm in reinforcement learning (RL). For nearly forty years, TD learning has served as a workhorse for applied RL as well as a building block for more complex and specialized…
Several problems in modeling and control of stochastically-driven dynamical systems can be cast as regularized semi-definite programs. We examine two such representative problems and show that they can be formulated in a similar manner. The…
Multi-step temporal-difference (TD) learning, where the update targets contain information from multiple time steps ahead, is one of the most popular forms of TD learning for linear function approximation. The reason is that multi-step…
In this paper we develop proximal methods for statistical learning. Proximal point algorithms are useful in statistics and machine learning for obtaining optimization solutions for composite functions. Our approach exploits closed-form…
Many applications using large datasets require efficient methods for minimizing a proximable convex function subject to satisfying a set of linear constraints within a specified tolerance. For this task, we present a proximal projection…
Temporal difference learning (TD) is a simple iterative algorithm used to estimate the value function corresponding to a given policy in a Markov decision process. Although TD is one of the most widely used algorithms in reinforcement…
The latent variable proximal point (LVPP) algorithm is a framework for solving infinite-dimensional variational problems with pointwise inequality constraints. The algorithm is a saddle point reformulation of the Bregman proximal point…
Gradient temporal difference (Gradient TD) algorithms are a popular class of stochastic approximation (SA) algorithms used for policy evaluation in reinforcement learning. Here, we consider Gradient TD algorithms with an additional heavy…
In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to…
We analyze several generic proximal splitting algorithms well suited for large-scale convex nonsmooth optimization. We derive sublinear and linear convergence results with new rates on the function value suboptimality or distance to the…
Large sectors of the recent optimization literature focused in the last decade on the development of optimal stochastic first order schemes for constrained convex models under progressively relaxed assumptions. Stochastic proximal point is…
This paper considers the robust phase retrieval problem, which can be cast as a nonsmooth and nonconvex optimization problem. We propose a new inexact proximal linear algorithm with the subproblem being solved inexactly. Our contributions…
In contrast with many other convex optimization classes, state-of-the-art semidefinite programming solvers are yet unable to efficiently solve large scale instances. This work aims to reduce this scalability gap by proposing a novel…
The proximal gradient method is a generic technique introduced to tackle the non-smoothness in optimization problems, wherein the objective function is expressed as the sum of a differentiable convex part and a non-differentiable…
The proximal gradient algorithm for minimizing the sum of a smooth and a nonsmooth convex function often converges linearly even without strong convexity. One common reason is that a multiple of the step length at each iteration may…
Temporal difference (TD) learning algorithms with neural network function parameterization have well-established empirical success in many practical large-scale reinforcement learning tasks. However, theoretical understanding of these…
Proximal algorithms have gained popularity in recent years in large-scale and distributed optimization problems. One such problem is the phase retrieval problem, for which proximal operators have been proposed recently. The phase retrieval…
Decentralized optimization is a powerful paradigm that finds applications in engineering and learning design. This work studies decentralized composite optimization problems with non-smooth regularization terms. Most existing gradient-based…
Temporal difference (TD) learning with linear function approximation (linear TD) is a classic and powerful prediction algorithm in reinforcement learning. While it is well-understood that linear TD converges almost surely to a unique point,…