Related papers: A Concentration Bound for LSPE($\lambda$)
Given an ODE and its perturbation, the Alekseev formula expresses the solutions of the latter in terms related to the former. By exploiting this formula and a new concentration inequality for martingale-differences, we develop a novel…
This paper provides a non-asymptotic analysis of linear stochastic approximation (LSA) algorithms with fixed stepsize. This family of methods arises in many machine learning tasks and is used to obtain approximate solutions of a linear…
In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance…
Abstract dynamic programming models are used to analyze $\lambda$-policy iteration with randomization algorithms. Particularly, contractive models with infinite policies are considered and it is shown that well-posedness of the…
We obtain non asymptotic concentration bounds for two kinds of stochastic approximations. We first consider the deviations between the expectation of a given function of the Euler scheme of some diffusion process at a fixed deterministic…
Policy evaluation with linear function approximation is an important problem in reinforcement learning. When facing high-dimensional feature spaces, such a problem becomes extremely hard considering the computation efficiency and quality of…
We consider LSTD($\lambda$), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision…
Constraint tightening to non-conservatively guarantee recursive feasibility and stability in Stochastic Model Predictive Control is addressed. Stability and feasibility requirements are considered separately, highlighting the difference…
We study best-policy identification for finite-horizon risk-sensitive reinforcement learning under the entropic risk measure. Recent work established a constant gap in the exponential horizon dependence between lower and upper bounds on the…
In this paper we discuss $\l$-policy iteration, a method for exact and approximate dynamic programming. It is intermediate between the classical value iteration (VI) and policy iteration (PI) methods, and it is closely related to optimistic…
Viewing a two time scale stochastic approximation scheme as a noisy discretization of a singularly perturbed differential equation, we obtain a concentration bound for its iterates that captures its behavior with quantifiable high…
Analyzing probabilistic programs and randomized algorithms are classical problems in computer science. The first basic problem in the analysis of stochastic processes is to consider the expectation or mean, and another basic problem is to…
We revisit the classical model of Tsitsiklis, Bertsekas and Athans for distributed stochastic approximation with consensus. The main result is an analysis of this scheme using the ODE approach to stochastic approximation, leading to a high…
We develop a new framework for deriving time-uniform concentration bounds for the output of stochastic sequential algorithms satisfying certain recursive inequalities akin to those defining the almost-supermartingale processes introduced by…
In this paper, we obtain fundamental $\mathcal{L}_{p}$ bounds in sequential prediction and recursive algorithms via an entropic analysis. Both classes of problems are examined by investigating the underlying entropic relationships of the…
Compressed Sensing algorithms often make use of the hard thresholding operator to pass from dense vectors to their best s-sparse approximations. However, the output of the hard thresholding operator does not depend on any information from a…
During recent years the interest of optimization and machine learning communities in high-probability convergence of stochastic optimization methods has been growing. One of the main reasons for this is that high-probability complexity…
Randomized higher-order computation can be seen as being captured by a lambda calculus endowed with a single algebraic operation, namely a construct for binary probabilistic choice. What matters about such computations is the probability of…
We consider the problem of estimating the Optimized Certainty Equivalent (OCE) risk from independent and identically distributed (i.i.d.) samples. For the classic sample average approximation (SAA) of OCE, we derive mean-squared error as…
Two-timescale Stochastic Approximation (SA) algorithms are widely used in Reinforcement Learning (RL). Their iterates have two parts that are updated using distinct stepsizes. In this work, we develop a novel recipe for their finite sample…