Related papers: Optimizing Return Distributions with Distributiona…
In dynamic programming (DP) and reinforcement learning (RL), an agent learns to act optimally in terms of expected long-term return by sequentially interacting with its environment modeled by a Markov decision process (MDP). More generally…
To date, distributional reinforcement learning (distributional RL) methods have exclusively focused on the discounted setting, where an agent aims to optimize a discounted sum of rewards over time. In this work, we extend distributional RL…
What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes?In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain…
Differential Dynamic Programming (DDP) has become a well established method for unconstrained trajectory optimization. Despite its several applications in robotics and controls however, a widely successful constrained version of the…
We consider a general class of Dynamic Programming (DP) problems with non-separable objective functions. We show that for any problem in this class, there exists an augmented-state DP problem which satisfies the Principle of Optimality and…
We introduce a novel class of algorithms to efficiently approximate the unknown return distributions in policy evaluation problems from distributional reinforcement learning (DRL). The proposed distributional dynamic programming algorithms…
With the development of deep learning, Dynamic Portfolio Optimization (DPO) problem has received a lot of attention in recent years, not only in the field of finance but also in the field of deep learning. Some advanced research in recent…
We propose and study a simple model of dynamical redistribution of capital in a diversified portfolio. We consider a hypothetical situation of a portfolio composed of N uncorrelated stocks. Each stock price follows a multiplicative random…
This paper presents a deep reinforcement learning (DRL) framework for dynamic portfolio optimization under market uncertainty and risk. The proposed model integrates a Sharpe ratio-based reward function with direct risk control mechanisms,…
This paper develops a dynamic programming (DP) approach for decentralized stochastic optimal control problems with delayed sharing information patterns, which exhibits the fundamental Properties of classical DP of centralized partially…
This article introduces a novel distributionally robust model predictive control (DRMPC) algorithm for a specific class of controlled dynamical systems where the disturbance multiplies the state and control variables. These classes of…
Approximate dynamic programming is a popular method for solving large Markov decision processes. This paper describes a new class of approximate dynamic programming (ADP) methods- distributionally robust ADP-that address the curse of…
Differential Dynamic Programming (DDP) is an efficient computational tool for solving nonlinear optimal control problems. It was originally designed as a single shooting method and thus is sensitive to the initial guess supplied. This work…
Distributionally robust optimization (DRO) problems are increasingly seen as a viable method to train machine learning models for improved model generalization. These min-max formulations, however, are more difficult to solve. We therefore…
We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution. Our key insight is that DRL algorithms can be…
We consider non-standard Markov Decision Processes (MDPs) where the target function is not only a simple expectation of the accumulated reward. Instead, we consider rather general functionals of the joint distribution of terminal state and…
Stochastic and (distributionally) robust optimization problems often become computationally challenging as the number of scenarios or data points increases. Scenario reduction is therefore a key technique for improving tractability. We…
Deep Reinforcement Learning (DRL) algorithms can scale to previously intractable problems. The automation of profit generation in the stock market is possible using DRL, by combining the financial assets price "prediction" step and the…
Differential Dynamic Programming (DDP) is one of the indirect methods for solving an optimal control problem. Several extensions to DDP have been proposed to add stagewise state and control constraints, which can mainly be classified as…
Distributional reinforcement learning, which focuses on learning the entire return distribution instead of only its expectation in standard RL, has demonstrated remarkable success in enhancing performance. Despite these advancements, our…