Related papers: A Block Coordinate Ascent Algorithm for Mean-Varia…
While deep reinforcement learning has achieved tremendous successes in various applications, most existing works only focus on maximizing the expected value of total return and thus ignore its inherent stochasticity. Such stochasticity is…
Dynamic optimization of mean and variance in Markov decision processes (MDPs) is a long-standing challenge caused by the failure of dynamic programming. In this paper, we propose a new approach to find the globally optimal policy for…
This paper introduces a new functional optimization approach to portfolio optimization problems by treating the unknown weight vector as a function of past values instead of treating them as fixed unknown coefficients in the majority of…
We present an exact algorithm for mean-risk optimization subject to a budget constraint, where decision variables may be continuous or integer. The risk is measured by the covariance matrix and weighted by an arbitrary monotone function,…
Multi-period mean-variance optimization is a long-standing problem, caused by the failure of dynamic programming principle. This paper studies the mean-variance optimization in a setting of finite-horizon discrete-time Markov decision…
We study the block-coordinate forward-backward algorithm in which the blocks are updated in a random and possibly parallel manner, according to arbitrary probabilities. The algorithm allows different stepsizes along the block-coordinates to…
This paper studies the risk-averse mean-variance optimization in infinite-horizon discounted Markov decision processes (MDPs). The involved variance metric concerns reward variability during the whole process, and future deviations are…
In this paper, we study a mean-variance optimization problem in an infinite horizon discrete time discounted Markov decision process (MDP). The objective is to minimize the variance of system rewards with the constraint of mean performance.…
We develop a stochastic approximation-type algorithm to solve finite state/action, infinite-horizon, risk-aware Markov decision processes. Our algorithm has two loops. The inner loop computes the risk by solving a stochastic saddle-point…
Block-coordinate descent (BCD) is a popular framework for large-scale regularized optimization problems with block-separable structure. Existing methods have several limitations. They often assume that subproblems can be solved exactly at…
The Coordinate Ascent Variational Inference scheme is a popular algorithm used to compute the mean-field approximation of a probability distribution of interest. We analyze its random scan version, under log-concavity assumptions on the…
The mean field variational Bayes method is becoming increasingly popular in statistics and machine learning. Its iterative Coordinate Ascent Variational Inference algorithm has been widely applied to large scale Bayesian inference. See Blei…
This paper investigates the optimization problem of an infinite stage discrete time Markov decision process (MDP) with a long-run average metric considering both mean and variance of rewards together. Such performance metric is important…
We describe an approximate dynamic programming approach to compute lower bounds on the optimal value function for a discrete time, continuous space, infinite horizon setting. The approach iteratively constructs a family of lower bounding…
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space. Focusing on a stochastic query model that provides noisy evaluations of the operator, we analyze a variance-reduced stochastic…
In this paper, we propose an inexact block coordinate descent algorithm for large-scale nonsmooth nonconvex optimization problems. At each iteration, a particular block variable is selected and updated by inexactly solving the original…
There are no computationally feasible algorithms that provide solutions to the finite horizon Risk-sensitive Constrained Markov Decision Process (Risk-CMDP) problem, even for problems with moderate horizon. With an aim to design the same,…
In this paper, we consider continuous-time stochastic optimal control problems where the cost is evaluated through a coherent risk measure. We provide an explicit gradient descent-ascent algorithm which applies to problems subject to…
Reinforcement learning considers the problem of finding policies that maximize an expected cumulative reward in a Markov decision process with unknown transition probabilities. In this paper we consider the problem of finding optimal…
This paper develops algorithms for high-dimensional stochastic control problems based on deep learning and dynamic programming. Unlike classical approximate dynamic programming approaches, we first approximate the optimal policy by means of…