English
Related papers

Related papers: Multiplier Bootstrap-based Exploration

200 papers

The multi-armed bandit(MAB) is a classical sequential decision problem. Most work requires assumptions about the reward distribution (e.g., bounded), while practitioners may have difficulty obtaining information about these distributions to…

Machine Learning · Computer Science 2023-12-14 Han Qi , Fei Guo , Li Zhu

In this paper, we propose a novel perturbation-based exploration method in bandit algorithms with bounded or unbounded rewards, called residual bootstrap exploration (\texttt{ReBoot}). The \texttt{ReBoot} enforces exploration by injecting…

Machine Learning · Statistics 2020-02-21 Chi-Hua Wang , Yang Yu , Botao Hao , Guang Cheng

In this paper, we consider stochastic multi-armed bandits (MABs) with heavy-tailed rewards, whose $p$-th moment is bounded by a constant $\nu_{p}$ for $1<p\leq2$. First, we propose a novel robust estimator which does not require $\nu_{p}$…

Machine Learning · Computer Science 2021-10-28 Kyungjae Lee , Hongjun Yang , Sungbin Lim , Songhwai Oh

We study incentivized exploration in multi-armed bandit (MAB) settings with infinitely many arms modeled as elements in continuous metric spaces. Unlike classical bandit models, we consider scenarios where the decision-maker (principal)…

Machine Learning · Computer Science 2025-08-28 Sourav Chakraborty , Amit Kiran Rege , Claire Monteleoni , Lijun Chen

Efficiently trading off exploration and exploitation is one of the key challenges in online Reinforcement Learning (RL). Most works achieve this by carefully estimating the model uncertainty and following the so-called optimistic model.…

Machine Learning · Computer Science 2024-09-16 Asaf Cassel , Orin Levy , Yishay Mansour

The stochastic multi-armed bandit problem is a well-known model for studying the exploration-exploitation trade-off. It has significant possible applications in adaptive clinical trials, which allow for dynamic changes in the treatment…

Machine Learning · Computer Science 2019-06-11 Hossein Aboutalebi , Doina Precup , Tibor Schuster

Upper Confidence Bound (UCB) method is arguably the most celebrated one used in online decision making with partial information feedback. Existing techniques for constructing confidence bounds are typically built upon various concentration…

Machine Learning · Statistics 2019-11-01 Botao Hao , Yasin Abbasi-Yadkori , Zheng Wen , Guang Cheng

Inspired by the Reward-Biased Maximum Likelihood Estimate method of adaptive control, we propose RBMLE -- a novel family of learning algorithms for stochastic multi-armed bandits (SMABs). For a broad range of SMABs including both the…

Machine Learning · Computer Science 2020-10-26 Xi Liu , Ping-Chun Hsieh , Anirban Bhattacharya , P. R. Kumar

In many platforms, user arrivals exhibit a self-reinforcing behavior: future user arrivals are likely to have preferences similar to users who were satisfied in the past. In other words, arrivals exhibit positive externalities. We study…

Machine Learning · Computer Science 2019-03-08 Virag Shah , Jose Blanchet , Ramesh Johari

We study the non-stationary stochastic multiarmed bandit (MAB) problem and propose two generic algorithms, namely, the limited memory deterministic sequencing of exploration and exploitation (LM-DSEE) and the Sliding-Window Upper Confidence…

Machine Learning · Statistics 2018-04-25 Lai Wei , Vaibhav Srivastava

Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs. This paper studies the stochastic contextual bandit problem with general bounded reward…

Machine Learning · Computer Science 2022-05-31 Yu-Heng Hung , Ping-Chun Hsieh

We study incentivized exploration for the multi-armed bandit (MAB) problem with non-stationary reward distributions, where players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on the…

Machine Learning · Computer Science 2024-03-19 Sourav Chakraborty , Lijun Chen

The Multiarmed Bandits (MAB) problem has been extensively studied and has seen many practical applications in a variety of fields. The Survival Multiarmed Bandits (S-MAB) open problem is an extension which constrains an agent to a budget…

Machine Learning · Computer Science 2024-11-06 Peter Veroutis , Frédéric Godin

This paper introduces the framework of multi-armed sampling, which serves as the sampling counterpart to the optimization problem of multi-armed bandits. Our primary motivation is to rigorously examine the exploration-exploitation trade-off…

Machine Learning · Computer Science 2026-05-14 Mohammad Pedramfar , Siamak Ravanbakhsh

We propose an online algorithm for cumulative regret minimization in a stochastic multi-armed bandit. The algorithm adds $O(t)$ i.i.d. pseudo-rewards to its history in round $t$ and then pulls the arm with the highest average reward in its…

Machine Learning · Computer Science 2019-11-06 Branislav Kveton , Csaba Szepesvari , Mohammad Ghavamzadeh , Craig Boutilier

We propose a new bootstrap-based online algorithm for stochastic linear bandit problems. The key idea is to adopt residual bootstrap exploration, in which the agent estimates the next step reward by re-sampling the residuals of mean reward…

Machine Learning · Statistics 2022-06-20 Shuang Wu , Chi-Hua Wang , Yuantong Li , Guang Cheng

We study incentivized exploration for the multi-armed bandit (MAB) problem where the players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on reward. We seek to understand the impact of…

Machine Learning · Computer Science 2019-12-17 Zhiyuan Liu , Huazheng Wang , Fan Shen , Kai Liu , Lijun Chen

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit problems, focusing on performance guarantees that hold with high probability. Such results are rather scarce in the literature since proving them…

Machine Learning · Computer Science 2015-11-04 Gergely Neu

In budget-limited multi-armed bandit (MAB) problems, the learner's actions are costly and constrained by a fixed budget. Consequently, an optimal exploitation policy may not be to pull the optimal arm repeatedly, as is the case in other…

Artificial Intelligence · Computer Science 2012-04-10 Long Tran-Thanh , Archie Chapman , Alex Rogers , Nicholas R. Jennings

In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. An approach based on…

Optimization and Control · Mathematics 2013-03-12 Sattar Vakili , Keqin Liu , Qing Zhao
‹ Prev 1 2 3 10 Next ›