English
Related papers

Related papers: Hyperparameter Selection Methods for Fitted Q-Eval…

200 papers

In this paper, we delve into the statistical analysis of the fitted Q-evaluation (FQE) method, which focuses on estimating the value of a target policy using offline data generated by some behavior policy. We provide a comprehensive…

Statistics Theory · Mathematics 2024-06-18 Jiayi Wang , Zhengling Qi , Raymond K. W. Wong

Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially deep neural networks, has gained practical success. While statistical…

Machine Learning · Statistics 2022-02-11 Ruiqi Zhang , Xuezhou Zhang , Chengzhuo Ni , Mengdi Wang

How to select between policies and value functions produced by different training algorithms in offline reinforcement learning (RL) -- which is crucial for hyperpa-rameter tuning -- is an important open question. Existing approaches based…

Machine Learning · Computer Science 2021-11-04 Siyuan Zhang , Nan Jiang

Holdout validation and hyperparameter tuning from data is a long-standing problem in offline reinforcement learning (RL). A standard framework is to use off-policy evaluation (OPE) methods to evaluate and select the policies, but OPE either…

Machine Learning · Computer Science 2025-10-27 Pai Liu , Lingfeng Zhao , Shivangi Agarwal , Jinghan Liu , Audrey Huang , Philip Amortila , Nan Jiang

Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood. In this paper, we study the use of bootstrapping in off-policy evaluation…

Machine Learning · Statistics 2022-05-24 Botao Hao , Xiang Ji , Yaqi Duan , Hao Lu , Csaba Szepesvári , Mengdi Wang

Offline reinforcement learning algorithms often require careful hyperparameter tuning. Before deployment, we need to select amongst a set of candidate policies. However, there is limited understanding about the fundamental limits of this…

Machine Learning · Computer Science 2026-02-17 Vincent Liu , Prabhat Nagarajan , Andrew Patterson , Martha White

Reinforcement learning has traditionally been studied with exponential discounting or the average reward setup, mainly due to their mathematical tractability. However, such frameworks fall short of accurately capturing human behavior, which…

Machine Learning · Computer Science 2024-09-18 S. R. Eshwar , Mayank Motwani , Nibedita Roy , Gugan Thoppe

Offline reinforcement learning (RL purely from logged data) is an important avenue for deploying RL techniques in real-world scenarios. However, existing hyperparameter selection methods for offline RL break the offline assumption by…

In reinforcement learning, distributional off-policy evaluation (OPE) focuses on estimating the return distribution of a target policy using offline data collected under a different policy. This work focuses on extending the widely used…

Machine Learning · Statistics 2025-10-21 Sungee Hong , Jiayi Wang , Zhengling Qi , Raymond K. W. Wong

In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the expected return of an evaluation policy given a fixed dataset that was collected by running one or more different policies. One of the more empirically…

Machine Learning · Computer Science 2023-10-31 Brahma S. Pavse , Josiah P. Hanna

Reinforcement learning (RL) can be used to learn treatment policies and aid decision making in healthcare. However, given the need for generalization over complex state/action spaces, the incorporation of function approximators (e.g., deep…

Machine Learning · Computer Science 2021-07-26 Shengpu Tang , Jenna Wiens

Fitted $Q$-evaluation (FQE) is a standard regression-based tool for off-policy evaluation, but existing stability guarantees often rely on Bellman completeness, a strong closure condition that can fail under function approximation. We study…

Machine Learning · Statistics 2026-05-11 Lars van der Laan , Nathan Kallus

The Off-Policy Evaluation (OPE) problem consists of evaluating the performance of counterfactual policies with data collected by another one. To solve the OPE problem, we resort to estimators, which aim to estimate in the most accurate way…

Machine Learning · Computer Science 2024-11-12 Nicolò Felicioni , Michael Benigni , Maurizio Ferrari Dacrema

A recently popular approach to solving reinforcement learning is with data from human preferences. In fact, human preference data are now used with classic reinforcement learning algorithms such as actor-critic methods, which involve…

Machine Learning · Computer Science 2024-02-28 Zihao Li , Xiang Ji , Minshuo Chen , Mengdi Wang

Quantum Amplitude Estimation (QAE) -- a technique by which the amplitude of a given quantum state can be estimated with quadratically fewer queries than by standard sampling -- is a key sub-routine in several important quantum algorithms,…

Quantum Physics · Physics 2020-06-26 Eric G. Brown , Oktay Goktas , W. K. Tham

Automated Feature Engineering (AFE) refers to automatically generate and select optimal feature sets for downstream tasks, which has achieved great success in real-world applications. Current AFE methods mainly focus on improving the…

Machine Learning · Computer Science 2022-12-27 Kafeng Wang , Pengyang Wang , Chengzhong xu

Quantum error mitigation (QEM) is essential for the noisy intermediate-scale quantum era, and will remain relevant for early fault-tolerant quantum computers, where logical error rates are still significant. However, most QEM methods incur…

Quantum Physics · Physics 2026-03-25 Pablo Díez-Valle , Gaurav Saxena , Jack S. Baker , Jun-Ho Lee , Thi Ha Kyaw

Training state-of-the-art vision models has become prohibitively expensive for researchers and practitioners. For the sake of accessibility and resource reuse, it is important to focus on adapting these models to a variety of downstream…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Malik Boudiaf , Romain Mueller , Ismail Ben Ayed , Luca Bertinetto

Hyperspectral sensing provides rich spectral information for scene understanding in urban driving, but its high dimensionality poses challenges for interpretation and efficient learning. We introduce Learnable Quantum Efficiency (LQE), a…

Computer Vision and Pattern Recognition · Computer Science 2026-04-17 Imad Ali Shah , Jiarong Li , Ethan Delaney , Enda Ward , Martin Glavin , Edward Jones , Brian Deegan

Hyperparameter optimization (HPO) is generally treated as a bi-level optimization problem that involves fitting a (probabilistic) surrogate model to a set of observed hyperparameter responses, e.g. validation loss, and consequently…

Machine Learning · Computer Science 2021-10-18 Hadi S. Jomaa , Jonas Falkner , Lars Schmidt-Thieme
‹ Prev 1 2 3 10 Next ›