Related papers: Hyperparameter Selection Methods for Fitted Q-Eval…
In this paper, we delve into the statistical analysis of the fitted Q-evaluation (FQE) method, which focuses on estimating the value of a target policy using offline data generated by some behavior policy. We provide a comprehensive…
Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially deep neural networks, has gained practical success. While statistical…
How to select between policies and value functions produced by different training algorithms in offline reinforcement learning (RL) -- which is crucial for hyperpa-rameter tuning -- is an important open question. Existing approaches based…
Holdout validation and hyperparameter tuning from data is a long-standing problem in offline reinforcement learning (RL). A standard framework is to use off-policy evaluation (OPE) methods to evaluate and select the policies, but OPE either…
Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood. In this paper, we study the use of bootstrapping in off-policy evaluation…
Offline reinforcement learning algorithms often require careful hyperparameter tuning. Before deployment, we need to select amongst a set of candidate policies. However, there is limited understanding about the fundamental limits of this…
Reinforcement learning has traditionally been studied with exponential discounting or the average reward setup, mainly due to their mathematical tractability. However, such frameworks fall short of accurately capturing human behavior, which…
Offline reinforcement learning (RL purely from logged data) is an important avenue for deploying RL techniques in real-world scenarios. However, existing hyperparameter selection methods for offline RL break the offline assumption by…
In reinforcement learning, distributional off-policy evaluation (OPE) focuses on estimating the return distribution of a target policy using offline data collected under a different policy. This work focuses on extending the widely used…
In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the expected return of an evaluation policy given a fixed dataset that was collected by running one or more different policies. One of the more empirically…
Reinforcement learning (RL) can be used to learn treatment policies and aid decision making in healthcare. However, given the need for generalization over complex state/action spaces, the incorporation of function approximators (e.g., deep…
Fitted $Q$-evaluation (FQE) is a standard regression-based tool for off-policy evaluation, but existing stability guarantees often rely on Bellman completeness, a strong closure condition that can fail under function approximation. We study…
The Off-Policy Evaluation (OPE) problem consists of evaluating the performance of counterfactual policies with data collected by another one. To solve the OPE problem, we resort to estimators, which aim to estimate in the most accurate way…
A recently popular approach to solving reinforcement learning is with data from human preferences. In fact, human preference data are now used with classic reinforcement learning algorithms such as actor-critic methods, which involve…
Quantum Amplitude Estimation (QAE) -- a technique by which the amplitude of a given quantum state can be estimated with quadratically fewer queries than by standard sampling -- is a key sub-routine in several important quantum algorithms,…
Automated Feature Engineering (AFE) refers to automatically generate and select optimal feature sets for downstream tasks, which has achieved great success in real-world applications. Current AFE methods mainly focus on improving the…
Quantum error mitigation (QEM) is essential for the noisy intermediate-scale quantum era, and will remain relevant for early fault-tolerant quantum computers, where logical error rates are still significant. However, most QEM methods incur…
Training state-of-the-art vision models has become prohibitively expensive for researchers and practitioners. For the sake of accessibility and resource reuse, it is important to focus on adapting these models to a variety of downstream…
Hyperspectral sensing provides rich spectral information for scene understanding in urban driving, but its high dimensionality poses challenges for interpretation and efficient learning. We introduce Learnable Quantum Efficiency (LQE), a…
Hyperparameter optimization (HPO) is generally treated as a bi-level optimization problem that involves fitting a (probabilistic) surrogate model to a set of observed hyperparameter responses, e.g. validation loss, and consequently…