Related papers: Randomized Optimal Stopping Problem in Continuous …

Exploratory Optimal Stopping: A Singular Control Formulation

This paper explores continuous-time and state-space optimal stopping problems from a reinforcement learning perspective. We begin by formulating the stopping problem using randomized stopping times, where the decision maker's control is…

Optimization and Control · Mathematics 2026-03-12 Jodi Dianetti , Giorgio Ferrari , Renyuan Xu

Reinforcement Learning in Real Option Models

We investigate an entropy-regularized reinforcement learning (RL) approach to optimal stopping problems motivated by real option models. Classical stopping rules are strict and non-randomized, limiting natural exploration in RL settings. To…

Optimization and Control · Mathematics 2026-02-18 Jodi Dianetti , Giorgio Ferrari , Renyuan Xu

Reinforcement Learning for Speculative Trading under Exploratory Framework

We study a speculative trading problem within the exploratory reinforcement learning (RL) framework of Wang et al. [2020]. The problem is formulated as a sequential optimal stopping problem over entry and exit times under general utility…

Mathematical Finance · Quantitative Finance 2026-04-03 Yun Zhao , Alex S. L. Tse , Harry Zheng

Continuous-time reinforcement learning for optimal switching over multiple regimes

This paper studies the continuous-time reinforcement learning (RL) for optimal switching problems across multiple regimes. We consider a type of exploratory formulation under entropy regularization where the agent randomizes both the timing…

Optimization and Control · Mathematics 2025-12-23 Yijie Huang , Mengge Li , Xiang Yu , Zhou Zhou

Robust Exploratory Stopping under Ambiguity in Reinforcement Learning

We propose and analyze a continuous-time robust reinforcement learning framework for optimal stopping under ambiguity. In this framework, an agent chooses a robust exploratory stopping time motivated by two objectives: robust…

Optimization and Control · Mathematics 2026-04-17 Junyan Ye , Hoi Ying Wong , Kyunghyun Park

Reinforcement Learning with an Abrupt Model Change

The problem of reinforcement learning is considered where the environment or the model undergoes a change. An algorithm is proposed that an agent can apply in such a problem to achieve the optimal long-time discounted reward. The algorithm…

Systems and Control · Electrical Eng. & Systems 2023-04-25 Wuxia Chen , Taposh Banerjee , Jemin George , Carl Busart

Reinforcement Learning in Economics and Finance

Reinforcement learning algorithms describe how an agent can learn an optimal action policy in a sequential decision process, through repeated experience. In a given environment, the agent policy provides him some running and terminal…

Theoretical Economics · Economics 2020-03-24 Arthur Charpentier , Romuald Elie , Carl Remlinger

Constrained Exploration in Reinforcement Learning with Optimality Preservation

We consider a class of reinforcement-learning systems in which the agent follows a behavior policy to explore a discrete state-action space to find an optimal policy while adhering to some restriction on its behavior. Such restriction may…

Machine Learning · Computer Science 2023-04-07 Peter C. Y. Chen

Learning Intrusion Prevention Policies through Optimal Stopping

We study automated intrusion prevention using reinforcement learning. In a novel approach, we formulate the problem of intrusion prevention as an optimal stopping problem. This formulation allows us insight into the structure of the optimal…

Artificial Intelligence · Computer Science 2024-04-23 Kim Hammar , Rolf Stadler

Constrained Exploration and Recovery from Experience Shaping

We consider the problem of reinforcement learning under safety requirements, in which an agent is trained to complete a given task, typically formalized as the maximization of a reward signal over time, while concurrently avoiding…

Machine Learning · Computer Science 2018-09-25 Tu-Hoa Pham , Giovanni De Magistris , Don Joven Agravante , Subhajit Chaudhury , Asim Munawar , Ryuki Tachibana

Reinforcement Learning with Non-Exponential Discounting

Commonly in reinforcement learning (RL), rewards are discounted over time using an exponential function to model time preference, thereby bounding the expected long-term reward. In contrast, in economics and psychology, it has been shown…

Machine Learning · Computer Science 2022-12-08 Matthias Schultheis , Constantin A. Rothkopf , Heinz Koeppl

On Multilateral Hierarchical Dynamic Decisions

Many decision problems in economics, information technology, and industry can be transformed to an optimal stopping of adapted random vectors with some utility function over the set of Markov times with respect to filtration build by the…

Optimization and Control · Mathematics 2020-11-04 Krzysztof Szajowski

Optimal Investment and Entropy-Regularized Learning Under Stochastic Volatility Models with Portfolio Constraints

We study the problem of optimal portfolio selection under stochastic volatility within a continuous time reinforcement learning framework with portfolio constraints. Exploration is modeled through entropy-regularized relaxed controls, where…

Mathematical Finance · Quantitative Finance 2026-04-27 Thai Nguyen , Pertiny Nkuize

A Globally Convergent Evolutionary Strategy for Stochastic Constrained Optimization with Applications to Reinforcement Learning

Evolutionary strategies have recently been shown to achieve competing levels of performance for complex optimization problems in reinforcement learning. In such problems, one often needs to optimize an objective function subject to a set of…

Neural and Evolutionary Computing · Computer Science 2022-02-23 Youssef Diouane , Aurelien Lucchi , Vihang Patil

Reinforcement Learning with Random Time Horizons

We extend the standard reinforcement learning framework to random time horizons. While the classical setting typically assumes finite and deterministic or infinite runtimes of trajectories, we argue that multiple real-world applications…

Machine Learning · Computer Science 2025-08-15 Enric Ribera Borrell , Lorenz Richter , Christof Schütte

Research on Optimal Control Problem Based on Reinforcement Learning under Knightian Uncertainty

Considering that the decision-making environment faced by reinforcement learning (RL) agents is full of Knightian uncertainty, this paper describes the exploratory state dynamics equation in Knightian uncertainty to study the…

Optimization and Control · Mathematics 2026-01-27 Ziyu Li , Chen Fei , Weiyin Fei

Optimal Options for Multi-Task Reinforcement Learning Under Time Constraints

Reinforcement learning can greatly benefit from the use of options as a way of encoding recurring behaviours and to foster exploration. An important open problem is how can an agent autonomously learn useful options when solving particular…

Machine Learning · Computer Science 2020-01-07 Manuel Del Verme , Bruno Castro da Silva , Gianluca Baldassarre

Recursive Constraints to Prevent Instability in Constrained Reinforcement Learning

We consider the challenge of finding a deterministic policy for a Markov decision process that uniformly (in all states) maximizes one reward subject to a probabilistic constraint over a different reward. Existing solutions do not fully…

Machine Learning · Computer Science 2022-01-21 Jaeyoung Lee , Sean Sedwards , Krzysztof Czarnecki

Provably Safe Reinforcement Learning for Stochastic Reach-Avoid Problems with Entropy Regularization

We consider the problem of learning the optimal policy for Markov decision processes with safety constraints. We formulate the problem in a reach-avoid setup. Our goal is to design online reinforcement learning algorithms that ensure safety…

Machine Learning · Computer Science 2026-01-21 Abhijit Mazumdar , Rafal Wisniewski , Manuela L. Bujorianu

Safe Reinforcement Learning for Constrained Markov Decision Processes with Stochastic Stopping Time

In this paper, we present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint. Despite the necessary attention of the scientific community, considering stochastic stopping time, the…

Machine Learning · Computer Science 2024-03-26 Abhijit Mazumdar , Rafal Wisniewski , Manuela L. Bujorianu