Related papers: A Cross Entropy based Stochastic Approximation Alg…
In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, \emph{i.e.}, estimating the value function of a model-free Markov reward process using the linear function approximation…
The cross entropy (CE) method is a model based search method to solve optimization problems where the objective function has minimal structure. The Monte-Carlo version of the CE method employs the naive sample averaging technique which is…
Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or…
To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice. Contrary to traditional RL algorithms…
This paper deals with the estimation of rare event probabilities using importance sampling (IS), where an optimal proposal distribution is computed with the cross-entropy (CE) method. Although, IS optimized with the CE method leads to an…
We present an iterative inverse reinforcement learning algorithm to infer optimal cost functions in continuous spaces. Based on a popular maximum entropy criteria, our approach iteratively finds a weight improvement step and proposes a…
In this paper, we present a cross-entropy optimization method for hyperparameter optimization in stochastic gradient-based approaches to train deep neural networks. The value of a hyperparameter of a learning algorithm often has great…
We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are…
The cross-entropy (CE) method is a popular stochastic method for optimization due to its simplicity and effectiveness. Designed for rare-event simulations where the probability of a target event occurring is relatively small, the CE-method…
Designing sample-efficient and computationally feasible reinforcement learning (RL) algorithms is particularly challenging in environments with large or infinite state and action spaces. In this paper, we advance this effort by presenting…
Parameter estimation in Markov random fields (MRFs) is a difficult task, in which inference over the network is run in the inner loop of a gradient descent procedure. Replacing exact inference with approximate methods such as loopy belief…
We study inverse reinforcement learning (IRL) and imitation learning (IM), the problems of recovering a reward or policy function from expert's demonstrated trajectories. We propose a new way to improve the learning process by adding a…
We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose transition probability is parametrized by an unknown transition core with features of state and action. Despite much recent progress in…
Trajectory optimizers for model-based reinforcement learning, such as the Cross-Entropy Method (CEM), can yield compelling results even in high-dimensional control tasks and sparse-reward environments. However, their sampling inefficiency…
In deep Reinforcement Learning (RL), value functions are typically approximated using deep neural networks and trained via mean squared error regression objectives to fit the true value functions. Recent research has proposed an alternative…
This paper studies the continuous-time reinforcement learning (RL) for optimal switching problems across multiple regimes. We consider a type of exploratory formulation under entropy regularization where the agent randomizes both the timing…
To overcome the curse of dimensionality and curse of modeling in Dynamic Programming (DP) methods for solving classical Markov Decision Process (MDP) problems, Reinforcement Learning (RL) algorithms are popular. In this paper, we consider…
Large language models (LLMs) have shown promise in performing complex multi-step reasoning, yet they continue to struggle with mathematical reasoning, often making systematic errors. A promising solution is reinforcement learning (RL)…
Model-free deep-reinforcement-based learning algorithms have been applied to a range of COPs~\cite{bello2016neural}~\cite{kool2018attention}~\cite{nazari2018reinforcement}. However, these approaches suffer from two key challenges when…
A basic simulation-based reinforcement learning algorithm is the Monte Carlo Exploring States (MCES) method, also known as optimistic policy iteration, in which the value function is approximated by simulated returns and a greedy policy is…