Related papers: Hyperparameter Selection Methods for Fitted Q-Eval…

A Fine-grained Analysis of Fitted Q-evaluation: Beyond Parametric Models

In this paper, we delve into the statistical analysis of the fitted Q-evaluation (FQE) method, which focuses on estimating the value of a target policy using offline data generated by some behavior policy. We provide a comprehensive…

Statistics Theory · Mathematics 2024-06-18 Jiayi Wang , Zhengling Qi , Raymond K. W. Wong

Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory

Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially deep neural networks, has gained practical success. While statistical…

Machine Learning · Statistics 2022-02-11 Ruiqi Zhang , Xuezhou Zhang , Chengzhuo Ni , Mengdi Wang

Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

How to select between policies and value functions produced by different training algorithms in offline reinforcement learning (RL) -- which is crucial for hyperpa-rameter tuning -- is an important open question. Existing approaches based…

Machine Learning · Computer Science 2021-11-04 Siyuan Zhang , Nan Jiang

Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol

Holdout validation and hyperparameter tuning from data is a long-standing problem in offline reinforcement learning (RL). A standard framework is to use off-policy evaluation (OPE) methods to evaluate and select the policies, but OPE either…

Machine Learning · Computer Science 2025-10-27 Pai Liu , Lingfeng Zhao , Shivangi Agarwal , Jinghan Liu , Audrey Huang , Philip Amortila , Nan Jiang

Bootstrapping Fitted Q-Evaluation for Off-Policy Inference

Bootstrapping provides a flexible and effective approach for assessing the quality of batch reinforcement learning, yet its theoretical property is less understood. In this paper, we study the use of bootstrapping in off-policy evaluation…

Machine Learning · Statistics 2022-05-24 Botao Hao , Xiang Ji , Yaqi Duan , Hao Lu , Csaba Szepesvári , Mengdi Wang

When is Offline Policy Selection Sample Efficient for Reinforcement Learning?

Offline reinforcement learning algorithms often require careful hyperparameter tuning. Before deployment, we need to select amongst a set of candidate policies. However, there is limited understanding about the fundamental limits of this…

Machine Learning · Computer Science 2026-02-17 Vincent Liu , Prabhat Nagarajan , Andrew Patterson , Martha White

Reinforcement Learning with Quasi-Hyperbolic Discounting

Reinforcement learning has traditionally been studied with exponential discounting or the average reward setup, mainly due to their mathematical tractability. However, such frameworks fall short of accurately capturing human behavior, which…

Machine Learning · Computer Science 2024-09-18 S. R. Eshwar , Mayank Motwani , Nibedita Roy , Gugan Thoppe

Hyperparameter Selection for Offline Reinforcement Learning

Offline reinforcement learning (RL purely from logged data) is an important avenue for deploying RL techniques in real-world scenarios. However, existing hyperparameter selection methods for offline RL break the offline assumption by…

Machine Learning · Computer Science 2020-07-20 Tom Le Paine , Cosmin Paduraru , Andrea Michi , Caglar Gulcehre , Konrad Zolna , Alexander Novikov , Ziyu Wang , Nando de Freitas

A Principled Path to Fitted Distributional Evaluation

In reinforcement learning, distributional off-policy evaluation (OPE) focuses on estimating the return distribution of a target policy using offline data collected under a different policy. This work focuses on extending the widely used…

Machine Learning · Statistics 2025-10-21 Sungee Hong , Jiayi Wang , Zhengling Qi , Raymond K. W. Wong

In reinforcement learning, off-policy evaluation (OPE) is the problem of estimating the expected return of an evaluation policy given a fixed dataset that was collected by running one or more different policies. One of the more empirically…

Machine Learning · Computer Science 2023-10-31 Brahma S. Pavse , Josiah P. Hanna

Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings

Reinforcement learning (RL) can be used to learn treatment policies and aid decision making in healthcare. However, given the need for generalization over complex state/action spaces, the incorporation of function approximators (e.g., deep…

Machine Learning · Computer Science 2021-07-26 Shengpu Tang , Jenna Wiens

Fitted $Q$ Evaluation Without Bellman Completeness via Stationary Weighting

Fitted $Q$-evaluation (FQE) is a standard regression-based tool for off-policy evaluation, but existing stability guarantees often rely on Bellman completeness, a strong closure condition that can fail under function approximation. We study…

Machine Learning · Statistics 2026-05-11 Lars van der Laan , Nathan Kallus

Automated Off-Policy Estimator Selection via Supervised Learning

The Off-Policy Evaluation (OPE) problem consists of evaluating the performance of counterfactual policies with data collected by another one. To solve the OPE problem, we resort to estimators, which aim to estimate in the most accurate way…

Machine Learning · Computer Science 2024-11-12 Nicolò Felicioni , Michael Benigni , Maurizio Ferrari Dacrema

Sample Complexity of Preference-Based Nonparametric Off-Policy Evaluation with Deep Networks

A recently popular approach to solving reinforcement learning is with data from human preferences. In fact, human preference data are now used with classic reinforcement learning algorithms such as actor-critic methods, which involve…

Machine Learning · Computer Science 2024-02-28 Zihao Li , Xiang Ji , Minshuo Chen , Mengdi Wang

Quantum Amplitude Estimation in the Presence of Noise

Quantum Amplitude Estimation (QAE) -- a technique by which the amplitude of a given quantum state can be estimated with quadratically fewer queries than by standard sampling -- is a key sub-routine in several important quantum algorithms,…

Quantum Physics · Physics 2020-06-26 Eric G. Brown , Oktay Goktas , W. K. Tham

Toward Efficient Automated Feature Engineering

Automated Feature Engineering (AFE) refers to automatically generate and select optimal feature sets for downstream tasks, which has achieved great success in real-world applications. Current AFE methods mainly focus on improving the…

Machine Learning · Computer Science 2022-12-27 Kafeng Wang , Pengyang Wang , Chengzhong xu

Physics-Inspired Extrapolation for efficient error mitigation and hardware certification

Quantum error mitigation (QEM) is essential for the noisy intermediate-scale quantum era, and will remain relevant for early fault-tolerant quantum computers, where logical error rates are still significant. However, most QEM methods incur…

Quantum Physics · Physics 2026-03-25 Pablo Díez-Valle , Gaurav Saxena , Jack S. Baker , Jun-Ho Lee , Thi Ha Kyaw

Parameter-free Online Test-time Adaptation

Training state-of-the-art vision models has become prohibitively expensive for researchers and practitioners. For the sake of accessibility and resource reuse, it is important to focus on adapting these models to a variety of downstream…

Computer Vision and Pattern Recognition · Computer Science 2022-04-05 Malik Boudiaf , Romain Mueller , Ismail Ben Ayed , Luca Bertinetto

Learnable Quantum Efficiency Filters for Urban Hyperspectral Segmentation

Hyperspectral sensing provides rich spectral information for scene understanding in urban driving, but its high dimensionality poses challenges for interpretation and efficient learning. We introduce Learnable Quantum Efficiency (LQE), a…

Computer Vision and Pattern Recognition · Computer Science 2026-04-17 Imad Ali Shah , Jiarong Li , Ethan Delaney , Enda Ward , Martin Glavin , Edward Jones , Brian Deegan

Improving Hyperparameter Optimization by Planning Ahead

Hyperparameter optimization (HPO) is generally treated as a bi-level optimization problem that involves fitting a (probabilistic) surrogate model to a set of observed hyperparameter responses, e.g. validation loss, and consequently…

Machine Learning · Computer Science 2021-10-18 Hadi S. Jomaa , Jonas Falkner , Lars Schmidt-Thieme