Related papers: A Cross Entropy based Stochastic Approximation Alg…

An Online Prediction Algorithm for Reinforcement Learning with Linear Function Approximation using Cross Entropy Method

In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, \emph{i.e.}, estimating the value function of a model-free Markov reward process using the linear function approximation…

Machine Learning · Computer Science 2018-06-19 Ajin George Joseph , Shalabh Bhatnagar

A Cross Entropy based Optimization Algorithm with Global Convergence Guarantees

The cross entropy (CE) method is a model based search method to solve optimization problems where the objective function has minimal structure. The Monte-Carlo version of the CE method employs the naive sample averaging technique which is…

Artificial Intelligence · Computer Science 2018-02-01 Ajin George Joseph , Shalabh Bhatnagar

Adaptive Resolving Methods for Reinforcement Learning with Function Approximations

Reinforcement learning (RL) problems are fundamental in online decision-making and have been instrumental in finding an optimal policy for Markov decision processes (MDPs). Function approximations are usually deployed to handle large or…

Machine Learning · Computer Science 2025-05-20 Jiashuo Jiang , Yiming Zong , Yinyu Ye

Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes

To overcome the curses of dimensionality and modeling of Dynamic Programming (DP) methods to solve Markov Decision Process (MDP) problems, Reinforcement Learning (RL) methods are adopted in practice. Contrary to traditional RL algorithms…

Machine Learning · Computer Science 2021-08-24 Arghyadip Roy , Vivek Borkar , Abhay Karandikar , Prasanna Chaporkar

Selecting Reduced Models in the Cross-Entropy Method

This paper deals with the estimation of rare event probabilities using importance sampling (IS), where an optimal proposal distribution is computed with the cross-entropy (CE) method. Although, IS optimized with the CE method leads to an…

Computation · Statistics 2020-02-05 Patrick Héas

Cost Function Estimation Using Inverse Reinforcement Learning with Minimal Observations

We present an iterative inverse reinforcement learning algorithm to infer optimal cost functions in continuous spaces. Based on a popular maximum entropy criteria, our approach iteratively finds a weight improvement step and proposes a…

Machine Learning · Computer Science 2025-05-14 Sarmad Mehrdad , Avadesh Meduri , Ludovic Righetti

Cross-Entropy Optimization for Hyperparameter Optimization in Stochastic Gradient-based Approaches to Train Deep Neural Networks

In this paper, we present a cross-entropy optimization method for hyperparameter optimization in stochastic gradient-based approaches to train deep neural networks. The value of a hyperparameter of a learning algorithm often has great…

Machine Learning · Computer Science 2024-09-17 Kevin Li , Fulu Li

Reinforcement Learning with Unbiased Policy Evaluation and Linear Function Approximation

We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes that involves the use of stochastic approximation algorithms along with state-of-the-art techniques that are…

Machine Learning · Computer Science 2022-10-17 Anna Winnicki , R. Srikant

Cross-Entropy Method Variants for Optimization

The cross-entropy (CE) method is a popular stochastic method for optimization due to its simplicity and effectiveness. Designed for rare-event simulations where the probability of a target event occurring is relatively small, the CE-method…

Machine Learning · Computer Science 2020-09-22 Robert J. Moss

Sample and Oracle Efficient Reinforcement Learning for MDPs with Linearly-Realizable Value Functions

Designing sample-efficient and computationally feasible reinforcement learning (RL) algorithms is particularly challenging in environments with large or infinite state and action spaces. In this paper, we advance this effort by presenting…

Machine Learning · Computer Science 2024-10-04 Zakaria Mhammedi

Constrained Approximate Maximum Entropy Learning of Markov Random Fields

Parameter estimation in Markov random fields (MRFs) is a difficult task, in which inference over the network is run in the inner loop of a gradient descent procedure. Replacing exact inference with approximate methods such as loopy belief…

Machine Learning · Computer Science 2012-06-18 Varun Ganapathi , David Vickrey , John Duchi , Daphne Koller

Weighted Maximum Entropy Inverse Reinforcement Learning

We study inverse reinforcement learning (IRL) and imitation learning (IM), the problems of recovering a reward or policy function from expert's demonstrated trajectories. We propose a new way to improve the learning process by adding a…

Machine Learning · Computer Science 2022-08-23 The Viet Bui , Tien Mai , Patrick Jaillet

Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation

We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose transition probability is parametrized by an unknown transition core with features of state and action. Despite much recent progress in…

Machine Learning · Statistics 2024-11-19 Taehyun Hwang , Min-hwan Oh

Sample-efficient Cross-Entropy Method for Real-time Planning

Trajectory optimizers for model-based reinforcement learning, such as the Cross-Entropy Method (CEM), can yield compelling results even in high-dimensional control tasks and sparse-reward environments. However, their sampling inefficiency…

Machine Learning · Computer Science 2020-08-17 Cristina Pinneri , Shambhuraj Sawant , Sebastian Blaes , Jan Achterhold , Joerg Stueckler , Michal Rolinek , Georg Martius

Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?

In deep Reinforcement Learning (RL), value functions are typically approximated using deep neural networks and trained via mean squared error regression objectives to fit the true value functions. Recent research has proposed an alternative…

Machine Learning · Computer Science 2024-11-19 Denis Tarasov , Kirill Brilliantov , Dmitrii Kharlapenko

Continuous-time reinforcement learning for optimal switching over multiple regimes

This paper studies the continuous-time reinforcement learning (RL) for optimal switching problems across multiple regimes. We consider a type of exploratory formulation under entropy regularization where the agent randomizes both the timing…

Optimization and Control · Mathematics 2025-12-23 Yijie Huang , Mengge Li , Xiang Yu , Zhou Zhou

A Structure-aware Online Learning Algorithm for Markov Decision Processes

To overcome the curse of dimensionality and curse of modeling in Dynamic Programming (DP) methods for solving classical Markov Decision Process (MDP) problems, Reinforcement Learning (RL) algorithms are popular. In this paper, we consider…

Machine Learning · Computer Science 2018-11-29 Arghyadip Roy , Vivek Borkar , Abhay Karandikar , Prasanna Chaporkar

Entropy-Regularized Process Reward Model

Large language models (LLMs) have shown promise in performing complex multi-step reasoning, yet they continue to struggle with mathematical reasoning, often making systematic errors. A promising solution is reinforcement learning (RL)…

Machine Learning · Computer Science 2025-09-22 Hanning Zhang , Pengcheng Wang , Shizhe Diao , Yong Lin , Rui Pan , Hanze Dong , Dylan Zhang , Pavlo Molchanov , Tong Zhang

Sample-Efficient, Exploration-Based Policy Optimisation for Routing Problems

Model-free deep-reinforcement-based learning algorithms have been applied to a range of COPs~\cite{bello2016neural}~\cite{kool2018attention}~\cite{nazari2018reinforcement}. However, these approaches suffer from two key challenges when…

Machine Learning · Computer Science 2022-06-01 Nasrin Sultana , Jeffrey Chan , Tabinda Sarwar , A. K. Qin

On the Convergence of Reinforcement Learning with Monte Carlo Exploring Starts

A basic simulation-based reinforcement learning algorithm is the Monte Carlo Exploring States (MCES) method, also known as optimistic policy iteration, in which the value function is approximated by simulated returns and a greedy policy is…

Optimization and Control · Mathematics 2020-07-22 Jun Liu