English
Related papers

Related papers: Exploiting Exogenous Structure for Sample-Efficien…

200 papers

Reinforcement learning algorithms are typically designed for generic Markov Decision Processes (MDPs), where any state-action pair can lead to an arbitrary transition distribution. In many practical systems, however, only a subset of the…

Machine Learning · Computer Science 2026-03-05 Davide Maran , Davide Salaorni , Marcello Restelli

Exogenous state variables and rewards can slow reinforcement learning by injecting uncontrolled variation into the reward signal. This paper formalizes exogenous state variables and rewards and shows that if the reward function decomposes…

Machine Learning · Computer Science 2026-01-15 George Trimponias , Thomas G. Dietterich

In real-world reinforcement learning applications the learner's observation space is ubiquitously high-dimensional with both relevant and irrelevant information about the task at hand. Learning from high-dimensional observations has been…

Machine Learning · Computer Science 2022-06-10 Yonathan Efroni , Dylan J. Foster , Dipendra Misra , Akshay Krishnamurthy , John Langford

We address the problem of approximate model minimization for MDPs in which the state is partitioned into endogenous and (much larger) exogenous components. An exogenous state variable is one whose dynamics are independent of the agent's…

Machine Learning · Computer Science 2019-10-01 Rohan Chitnis , Tomás Lozano-Pérez

Learning a Markov Decision Process (MDP) from a fixed batch of trajectories is a non-trivial task whose outcome's quality depends on both the amount and the diversity of the sampled regions of the state-action space. Yet, many MDPs are…

Machine Learning · Computer Science 2022-03-08 Giorgio Angelotti , Nicolas Drougard , Caroline P. C. Chanel

Exogenous state variables and rewards can slow down reinforcement learning by injecting uncontrolled variation into the reward signal. We formalize exogenous state variables and rewards and identify conditions under which an MDP with…

Machine Learning · Computer Science 2018-06-06 Thomas G. Dietterich , George Trimponias , Zhitang Chen

We study reinforcement learning for the optimal control of Branching Markov Decision Processes (BMDPs), a natural extension of (multitype) Branching Markov Chains (BMCs). The state of a (discrete-time) BMCs is a collection of entities of…

Machine Learning · Computer Science 2021-06-15 Ernst Moritz Hahn , Mateo Perez , Sven Schewe , Fabio Somenzi , Ashutosh Trivedi , Dominik Wojtczak

A novel reinforcement learning scheme to synthesize policies for continuous-space Markov decision processes (MDPs) is proposed. This scheme enables one to apply model-free, off-the-shelf reinforcement learning algorithms for finite MDPs to…

Systems and Control · Electrical Eng. & Systems 2020-03-03 Abolfazl Lavaei , Fabio Somenzi , Sadegh Soudjani , Ashutosh Trivedi , Majid Zamani

The curse of dimensionality is a widely known issue in reinforcement learning (RL). In the tabular setting where the state space $\mathcal{S}$ and the action space $\mathcal{A}$ are both finite, to obtain a nearly optimal policy with…

Machine Learning · Computer Science 2022-10-28 Bingyan Wang , Yuling Yan , Jianqing Fan

Exogenous MDPs (Exo-MDPs) capture sequential decision-making where uncertainty comes solely from exogenous inputs that evolve independently of the learner's actions. This structure is especially common in operations research applications…

Machine Learning · Computer Science 2026-01-29 Hao Liang , Jiayu Cheng , Sean R. Sinclair , Yali Du

Robust Markov decision processes (r-MDPs) extend MDPs by explicitly modelling epistemic uncertainty about transition dynamics. Learning r-MDPs from interactions with an unknown environment enables the synthesis of robust policies with…

Machine Learning · Computer Science 2025-11-21 Yannik Schnitzer , Alessandro Abate , David Parker

Modern large-scale computing deployments consist of complex applications running over machine clusters. An important issue in these is the offering of elasticity, i.e., the dynamic allocation of resources to applications to meet fluctuating…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-13 Konstantinos Lolos , Ioannis Konstantinou , Verena Kantere , Nectarios Koziris

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (or minimize…

Optimization and Control · Mathematics 2015-07-07 Mahmoud El Chamie , Behcet Acikmese

In this paper, we consider a modified version of the control problem in a model free Markov decision process (MDP) setting with large state and action spaces. The control problem most commonly addressed in the contemporary literature is to…

Artificial Intelligence · Computer Science 2018-02-01 Ajin George Joseph , Shalabh Bhatnagar

Markov Decision Process (MDP) presents a mathematical framework to formulate the learning processes of agents in reinforcement learning. MDP is limited by the Markovian assumption that a reward only depends on the immediate state and…

Machine Learning · Computer Science 2024-06-04 Bohao Qu , Xiaofeng Cao , Jielong Yang , Hechang Chen , Chang Yi , Ivor W. Tsang , Yew-Soon Ong

We present a long-term intrinsically motivated structure learning method for modeling transition dynamics during controlled interactions between a robot and semi-permanent structures in the world. In particular, we discuss how…

Robotics · Computer Science 2016-07-18 Jay Ming Wong , Roderic A. Grupen

We study a class of multi-stage stochastic programs, which incorporate modeling features from Markov decision processes (MDPs). This class includes structured MDPs with continuous action and state spaces. We extend policy graphs to include…

Machine Learning · Computer Science 2026-04-09 David P. Morton , Oscar Dowson , Bernardo K. Pagnoncelli

Conventional imitation learning assumes access to the actions of demonstrators, but these motor signals are often non-observable in naturalistic settings. Additionally, sequential decision-making behaviors in these settings can deviate from…

Machine Learning · Computer Science 2023-10-31 Aoyang Qin , Feng Gao , Qing Li , Song-Chun Zhu , Sirui Xie

Markov decision processes (MDPs) are used to model a wide variety of applications ranging from game playing over robotics to finance. Their optimal policy typically maximizes the expected sum of rewards given at each step of the decision…

Machine Learning · Computer Science 2025-05-26 Maximilian Nägele , Jan Olle , Thomas Fösel , Remmy Zen , Florian Marquardt

In order to train agents that can quickly adapt to new objectives or reward functions, efficient unsupervised representation learning in sequential decision-making environments can be important. Frameworks such as the Exogenous Block Markov…

Machine Learning · Computer Science 2025-03-18 Alexander Levine , Peter Stone , Amy Zhang
‹ Prev 1 2 3 10 Next ›