Related papers: Sample-efficient Cross-Entropy Method for Real-tim…
Recent works in high-dimensional model-predictive control and model-based reinforcement learning with learned dynamics and reward models have resorted to population-based optimization methods, such as the Cross-Entropy Method (CEM), for…
Current state-of-the-art model-based reinforcement learning algorithms use trajectory sampling methods, such as the Cross-Entropy Method (CEM), for planning in continuous control settings. These zeroth-order optimizers require sampling a…
Model-based reinforcement learning (MBRL) with real-time planning has shown great potential in locomotion and manipulation control tasks. However, the existing planning methods, such as the Cross-Entropy Method (CEM), do not scale well to…
Cross-Entropy Method (CEM) is commonly used for planning in model-based reinforcement learning (MBRL) where a centralized approach is typically utilized to update the sampling distribution based on only the top-$k$ operation's results on…
Cross-entropy method model predictive control (CEM--MPC) is a powerful gradient-free technique for nonlinear optimal control, but its performance is often limited by the reliance on random sampling. This conventional approach can lead to…
We study the cross-entropy method (CEM) for the non-convex optimization of a continuous and parameterized objective function and introduce a differentiable variant that enables us to differentiate the output of CEM with respect to the…
The cross-entropy (CE) method is a popular stochastic method for optimization due to its simplicity and effectiveness. Designed for rare-event simulations where the probability of a target event occurring is relatively small, the CE-method…
In this work we present a new method of black-box optimization and constraint satisfaction. Existing algorithms that have attempted to solve this problem are unable to consider multiple modes, and are not able to adapt to changes in…
The cross entropy (CE) method is a model based search method to solve optimization problems where the objective function has minimal structure. The Monte-Carlo version of the CE method employs the naive sample averaging technique which is…
Model-free deep-reinforcement-based learning algorithms have been applied to a range of COPs~\cite{bello2016neural}~\cite{kool2018attention}~\cite{nazari2018reinforcement}. However, these approaches suffer from two key challenges when…
Deep networks have enabled reinforcement learning to scale to more complex and challenging domains, but these methods typically require large quantities of training data. An alternative is to use sample-efficient episodic control methods:…
The Cross Entropy method is a well-known adaptive importance sampling method for rare-event probability estimation, which requires estimating an optimal importance sampling density within a parametric class. In this article we estimate an…
Reinforcement learning can learn amortised design policies for designing sequences of experiments. However, current amortised methods rely on estimators of expected information gain (EIG) that require an exponential number of samples on the…
The cross-entropy method is a simple but efficient method for global optimization. In this paper we provide two online variants of the basic CEM, together with a proof of convergence.
Scalability is a major challenge in modern recommender systems. In sequential recommendations, full Cross-Entropy (CE) loss achieves state-of-the-art recommendation quality but consumes excessive GPU memory with large item catalogs,…
The Cross-Entropy Method (CEM) is a widely adopted trajectory optimizer in model-based reinforcement learning (MBRL), but its unimodal sampling strategy often leads to premature convergence in multimodal landscapes. In this work, we propose…
In this paper, we provide two new stable online algorithms for the problem of prediction in reinforcement learning, \emph{i.e.}, estimating the value function of a model-free Markov reward process using the linear function approximation…
In this paper, we provide a new algorithm for the problem of prediction in Reinforcement Learning, \emph{i.e.}, estimating the Value Function of a Markov Reward Process (MRP) using the linear function approximation architecture, with memory…
This paper studies the constrained/safe reinforcement learning (RL) problem with sparse indicator signals for constraint violations. We propose a model-based approach to enable RL agents to effectively explore the environment with unknown…
Recent advances in reinforcement learning have demonstrated its ability to solve hard agent-environment interaction tasks on a super-human level. However, the application of reinforcement learning methods to practical and real-world tasks…