English
Related papers

Related papers: Quantile Markov Decision Process

200 papers

In the Markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the quantile…

Artificial Intelligence · Computer Science 2016-12-02 Hugo Gilbert , Paul Weng , Yan Xu

Markov decision processes (MDPs) are used to model a wide variety of applications ranging from game playing over robotics to finance. Their optimal policy typically maximizes the expected sum of rewards given at each step of the decision…

Machine Learning · Computer Science 2025-05-26 Maximilian Nägele , Jan Olle , Thomas Fösel , Remmy Zen , Florian Marquardt

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (or minimize…

Optimization and Control · Mathematics 2015-07-07 Mahmoud El Chamie , Behcet Acikmese

Standard Markov decision process (MDP) and reinforcement learning algorithms optimize the policy with respect to the expected gain. We propose an algorithm which enables to optimize an alternative objective: the probability that the gain is…

Machine Learning · Computer Science 2023-03-06 Vincent Corlay , Jean-Christophe Sibel

In most common settings of Markov Decision Process (MDP), an agent evaluate a policy based on expectation of (discounted) sum of rewards. However in many applications this criterion might not be suitable from two perspective: first, in risk…

Artificial Intelligence · Computer Science 2017-05-11 Yan Li , Zhaohan Sun

We introduce the notion of quantum Markov decision process (qMDP) as a semantic model of nondeterministic and concurrent quantum programs. It is shown by examples that qMDPs can be used in analysis of quantum algorithms and protocols. We…

Quantum Physics · Physics 2014-07-10 Shenggang Ying , Mingsheng Ying

This paper studies the optimization of Markov decision processes (MDPs) from a risk-seeking perspective, where the risk is measured by conditional value-at-risk (CVaR). The objective is to find a policy that maximizes the long-run CVaR of…

Optimization and Control · Mathematics 2023-12-05 Li Xia , Zhihui Yu , Peter W. Glynn

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize…

Optimization and Control · Mathematics 2015-07-08 Mahmoud El Chamie , Behcet Acikmese

Markov decision processes (MDPs) are a popular model for performance analysis and optimization of stochastic systems. The parameters of stochastic behavior of MDPs are estimates from empirical observations of a system; their values are not…

Artificial Intelligence · Computer Science 2017-10-26 Dimitri Scheftelowitsch , Peter Buchholz , Vahid Hashemi , Holger Hermanns

Value-at-risk (VaR), also known as quantile, is a crucial risk measure in finance and other fields. However, optimizing VaR metrics in Markov decision processes (MDPs) is challenging because VaR is non-additive and the traditional dynamic…

Optimization and Control · Mathematics 2025-07-31 Li Xia , Jinyan Pan

In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in…

Machine Learning · Computer Science 2024-11-01 Jia Lin Hau , Erick Delage , Esther Derman , Mohammad Ghavamzadeh , Marek Petrik

We consider Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) objectives. There exist two different views: (i) the expectation semantics, where the goal is to optimize the expected mean-payoff objective, and (ii)…

Logic in Computer Science · Computer Science 2019-03-14 Krishnendu Chatterjee , Zuzana Křetínská , Jan Křetínský

In this paper, the aim is to develop a quantum counterpart to classical Markov decision processes (MDPs). Firstly, we provide a very general formulation of quantum MDPs with state and action spaces in the quantum domain, quantum…

Quantum Physics · Physics 2024-09-19 Naci Saldi , Sina Sanjari , Serdar Yuksel

In this paper, we consider a finite-horizon Markov decision process (MDP) for which the objective at each stage is to minimize a quantile-based risk measure (QBRM) of the sequence of future costs; we call the overall objective a dynamic…

Optimization and Control · Mathematics 2017-05-10 Daniel R. Jiang , Warren B. Powell

In this paper, we study a mean-variance optimization problem in an infinite horizon discrete time discounted Markov decision process (MDP). The objective is to minimize the variance of system rewards with the constraint of mean performance.…

Optimization and Control · Mathematics 2017-08-24 Li Xia

In this paper we address the problem of decision making within a Markov decision process (MDP) framework where risk and modeling errors are taken into account. Our approach is to minimize a risk-sensitive conditional-value-at-risk (CVaR)…

Artificial Intelligence · Computer Science 2015-06-09 Yinlam Chow , Aviv Tamar , Shie Mannor , Marco Pavone

We consider a dynamic programming (DP) approach to approximately solving an infinite-horizon constrained Markov decision process (CMDP) problem with a fixed initial-state for the expected total discounted-reward criterion with a…

Optimization and Control · Mathematics 2023-08-08 Hyeong Soo Chang

The online Markov decision process (MDP) is a generalization of the classical Markov decision process that incorporates changing reward functions. In this paper, we propose practical online MDP algorithms with policy iteration and…

Machine Learning · Computer Science 2015-10-16 Yao Ma , Hao Zhang , Masashi Sugiyama

This paper investigates the optimization problem of an infinite stage discrete time Markov decision process (MDP) with a long-run average metric considering both mean and variance of rewards together. Such performance metric is important…

Optimization and Control · Mathematics 2020-08-11 Li Xia

We consider a planning problem where the dynamics and rewards of the environment depend on a hidden static parameter referred to as the context. The objective is to learn a strategy that maximizes the accumulated reward across all contexts.…

Machine Learning · Statistics 2015-02-10 Assaf Hallak , Dotan Di Castro , Shie Mannor
‹ Prev 1 2 3 10 Next ›