English
Related papers

Related papers: A Batch Sequential Halving Algorithm without Perfo…

200 papers

In a fixed-confidence pure exploration problem in stochastic multi-armed bandits, an algorithm iteratively samples arms and should stop as early as possible and return the correct answer to a query about the arms distributions. We are…

Machine Learning · Computer Science 2025-02-04 Adrienne Tuynman , Rémy Degenne

Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches, due to limited time the agent can wait for the feedback. Such applications include biological experimentation and online…

Machine Learning · Statistics 2023-12-22 Shengyu Cao , Simai He , Ruoqing Jiang , Jin Xu , Hongsong Yuan

Simple regret is a natural and parameter-free performance criterion for pure exploration in multi-armed bandits yet is less popular than the probability of missing the best arm or an $\epsilon$-good arm, perhaps due to lack of easy ways to…

Machine Learning · Computer Science 2023-02-03 Yao Zhao , Connor James Stephens , Csaba Szepesvári , Kwang-Sung Jun

We study Thompson Sampling algorithms for stochastic multi-armed bandits in the batched setting, in which we want to minimize the regret over a sequence of arm pulls using a small number of policy changes (or, batches). We propose two…

Machine Learning · Computer Science 2021-08-17 Nikolai Karpov , Qin Zhang

We consider the best-arm identification problem in multi-armed bandits, which focuses purely on exploration. A player is given a fixed budget to explore a finite set of arms, and the rewards of each arm are drawn independently from a fixed,…

Machine Learning · Statistics 2017-08-02 Shahin Shahrampour , Mohammad Noshad , Vahid Tarokh

We study the sequential batch learning problem in linear contextual bandits with finite action sets, where the decision maker is constrained to split incoming individuals into (at most) a fixed number of batches and can only observe…

Machine Learning · Computer Science 2020-04-15 Yanjun Han , Zhengqing Zhou , Zhengyuan Zhou , Jose Blanchet , Peter W. Glynn , Yinyu Ye

We study the asymptotic performance of the Thompson sampling algorithm in the batched multi-armed bandit setting where the time horizon $T$ is divided into batches, and the agent is not able to observe the rewards of her actions until the…

Machine Learning · Computer Science 2021-10-04 Cem Kalkanli , Ayfer Ozgur

Multi-armed bandits are one of the theoretical pillars of reinforcement learning. Recently, the investigation of quantum algorithms for multi-armed bandit problems was started, and it was found that a quadratic speed-up (in query…

Quantum Physics · Physics 2025-03-26 Simon Buchholz , Jonas M. Kübler , Bernhard Schölkopf

In this paper, we study the multi-armed bandit problem in the batched setting where the employed policy must split data into a small number of batches. While the minimax regret for the two-armed stochastic bandits has been completely…

Machine Learning · Statistics 2019-10-29 Zijun Gao , Yanjun Han , Zhimei Ren , Zhengqing Zhou

Learning from prior tasks and transferring that experience to improve future performance is critical for building lifelong learning agents. Although results in supervised and reinforcement learning show that transfer may significantly…

Machine Learning · Statistics 2013-07-29 Mohammad Gheshlaghi Azar , Alessandro Lazaric , Emma Brunskill

We analyze completely the convergence speed of the \emph{batch learning algorithm}, and compare its speed to that of the memoryless learning algorithm and of learning with memory. We show that the batch learning algorithm is never worse…

Machine Learning · Computer Science 2007-05-23 Igor Rivin

The combinatorial stochastic semi-bandit problem is an extension of the classical multi-armed bandit problem in which an algorithm pulls more than one arm at each stage and the rewards of all pulled arms are revealed. One difference with…

Machine Learning · Computer Science 2016-12-07 Rémy Degenne , Vianney Perchet

We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve over the best-known regret bounds for any number…

Data Structures and Algorithms · Computer Science 2020-02-19 Hossein Esfandiari , Amin Karbasi , Abbas Mehrabian , Vahab Mirrokni

This paper considers a multi-armed bandit game where the number of arms is much larger than the maximum budget and is effectively infinite. We characterize necessary and sufficient conditions on the total budget for an algorithm to return…

Machine Learning · Statistics 2019-01-15 Maryam Aziz , Kevin Jamieson , Javed Aslam

Bayesian Optimization aims at optimizing an unknown non-convex/concave function that is costly to evaluate. We are interested in application scenarios where concurrent function evaluations are possible. Under such a setting, BO could choose…

Artificial Intelligence · Computer Science 2012-05-02 Javad Azimi , Ali Jalali , Xiaoli Fern

Many real-world functions are defined over both categorical and category-specific continuous variables and thus cannot be optimized by traditional Bayesian optimization (BO) methods. To optimize such functions, we propose a new method that…

Machine Learning · Computer Science 2019-12-02 Dang Nguyen , Sunil Gupta , Santu Rana , Alistair Shilton , Svetha Venkatesh

We consider a special case of bandit problems, named batched bandits, in which an agent observes batches of responses over a certain time period. Unlike previous work, we consider a more practically relevant batch-centric scenario of batch…

Machine Learning · Computer Science 2023-04-04 Danil Provodin , Pratik Gajane , Mykola Pechenizkiy , Maurits Kaptein

Often, machine learning applications have to cope with dynamic environments where data are collected in the form of continuous data streams with potentially infinite length and transient behavior. Compared to traditional (batch) data…

Machine Learning · Computer Science 2021-12-21 Guilherme Cassales , Heitor Gomes , Albert Bifet , Bernhard Pfahringer , Hermes Senger

Increasing the mini-batch size for stochastic gradient descent offers significant opportunities to reduce wall-clock training time, but there are a variety of theoretical and systems challenges that impede the widespread success of this…

Efficiently trading off exploration and exploitation is one of the key challenges in online Reinforcement Learning (RL). Most works achieve this by carefully estimating the model uncertainty and following the so-called optimistic model.…

Machine Learning · Computer Science 2024-09-16 Asaf Cassel , Orin Levy , Yishay Mansour
‹ Prev 1 2 3 10 Next ›