Related papers: Double Explore-then-Commit: Asymptotic Optimality …

Explore-then-Commit Algorithms for Decentralized Two-Sided Matching Markets

Online learning in a decentralized two-sided matching markets, where the demand-side (players) compete to match with the supply-side (arms), has received substantial interest because it abstracts out the complex interactions in matching…

Machine Learning · Computer Science 2024-08-19 Tejas Pagare , Avishek Ghosh

Risk-Averse Explore-Then-Commit Algorithms for Finite-Time Bandits

In this paper, we study multi-armed bandit problems in explore-then-commit setting. In our proposed explore-then-commit setting, the goal is to identify the best arm after a pure experimentation (exploration) phase and exploit it once or…

Machine Learning · Computer Science 2020-12-16 Ali Yekkehkhany , Ebrahim Arian , Mohammad Hajiesmaili , Rakesh Nagi

High-dimensional Contextual Bandit Problem without Sparsity

In this research, we investigate the high-dimensional linear contextual bandit problem where the number of features $p$ is greater than the budget $T$, or it may even be infinite. Differing from the majority of previous works in this field,…

Machine Learning · Statistics 2025-06-27 Junpei Komiyama , Masaaki Imaizumi

Explore-then-Commit for Nonstationary Linear Bandits with Latent Dynamics

We study a nonstationary bandit problem where rewards depend on both actions and latent states, the latter governed by unknown linear dynamics. Crucially, the state dynamics also depend on the actions, resulting in tension between…

Machine Learning · Computer Science 2025-10-21 Sunmook Choi , Yahya Sattar , Yassir Jedra , Maryam Fazel , Sarah Dean

On Explore-Then-Commit Strategies

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by…

Statistics Theory · Mathematics 2016-11-15 Aurélien Garivier , Emilie Kaufmann , Tor Lattimore

Constant or logarithmic regret in asynchronous multiplayer bandits

Multiplayer bandits have recently been extensively studied because of their application to cognitive radio networks. While the literature mostly considers synchronous players, radio networks (e.g. for IoT) tend to have asynchronous devices.…

Machine Learning · Computer Science 2023-06-01 Hugo Richard , Etienne Boursier , Vianney Perchet

Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values…

Machine Learning · Computer Science 2025-02-13 Julien Zhou , Pierre Gaillard , Thibaud Rahier , Julyan Arbel

Two-Player Zero-Sum Games with Bandit Feedback

We study a two-player zero-sum game in which the row player aims to maximize their payoff against a competing column player, under an unknown payoff matrix estimated through bandit feedback. We propose three algorithms based on the…

Machine Learning · Computer Science 2026-02-20 Elif Yılmaz , Christos Dimitrakakis

Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms

This paper considers a stochastic Multi-Armed Bandit (MAB) problem with dual objectives: (i) quick identification and commitment to the optimal arm, and (ii) reward maximization throughout a sequence of $T$ consecutive rounds. Though each…

Machine Learning · Computer Science 2024-05-31 Qining Zhang , Lei Ying

Contextual Bandit Optimization with Pre-Trained Neural Networks

Bandit optimization is a difficult problem, especially if the reward model is high-dimensional. When rewards are modeled by neural networks, sublinear regret has only been shown under strong assumptions, usually when the network is…

Machine Learning · Computer Science 2025-01-14 Mikhail Terekhov

Probe-then-Commit Multi-Objective Bandits: Theoretical Benefits of Limited Multi-Arm Feedback

We study an online resource-selection problem motivated by multi-radio access selection and mobile edge computing offloading. In each round, an agent chooses among $K$ candidate links/servers (arms) whose performance is a stochastic…

Machine Learning · Computer Science 2026-02-23 Ming Shi

Non-Asymptotic Analysis of a UCB-based Top Two Algorithm

A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms, a leader and a challenger. Due to their simplicity and good empirical performance, they have received…

Machine Learning · Statistics 2023-11-08 Marc Jourdan , Rémy Degenne

Achieving Exponential Asymptotic Optimality in Average-Reward Restless Bandits without Global Attractor Assumption

We consider the infinite-horizon average-reward restless bandit problem. We propose a novel \emph{two-set policy} that maintains two dynamic subsets of arms: one subset of arms has a nearly optimal state distribution and takes actions…

Machine Learning · Computer Science 2024-10-18 Yige Hong , Qiaomin Xie , Yudong Chen , Weina Wang

Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems

In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward models. At each time, a player selects one arm to play, aiming to maximize the total expected reward over a horizon of length T. An approach based on…

Optimization and Control · Mathematics 2013-03-12 Sattar Vakili , Keqin Liu , Qing Zhao

The Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

We investigate a Bayesian $k$-armed bandit problem in the \emph{many-armed} regime, where $k \geq \sqrt{T}$ and $T$ represents the time horizon. Initially, and aligned with recent literature on many-armed bandit problems, we observe that…

Machine Learning · Computer Science 2024-03-21 Mohsen Bayati , Nima Hamidi , Ramesh Johari , Khashayar Khosravi

Instance-Optimality in Interactive Decision Making: Toward a Non-Asymptotic Theory

We consider the development of adaptive, instance-dependent algorithms for interactive decision making (bandits, reinforcement learning, and beyond) that, rather than only performing well in the worst case, adapt to favorable properties of…

Machine Learning · Computer Science 2023-04-26 Andrew Wagenmaker , Dylan J. Foster

An Asymptotically Optimal Batched Algorithm for the Dueling Bandit Problem

We study the $K$-armed dueling bandit problem, a variation of the traditional multi-armed bandit problem in which feedback is obtained in the form of pairwise comparisons. Previous learning algorithms have focused on the $\textit{fully…

Machine Learning · Computer Science 2022-09-27 Arpit Agarwal , Rohan Ghuge , Viswanath Nagarajan

Efficient Pure Exploration for Combinatorial Bandits with Semi-Bandit Feedback

Combinatorial bandits with semi-bandit feedback generalize multi-armed bandits, where the agent chooses sets of arms and observes a noisy reward for each arm contained in the chosen set. The action set satisfies a given structure such as…

Machine Learning · Statistics 2021-01-22 Marc Jourdan , Mojmír Mutný , Johannes Kirschner , Andreas Krause

Optimal Batched Linear Bandits

We introduce the E$^4$ algorithm for the batched linear bandit problem, incorporating an Explore-Estimate-Eliminate-Exploit framework. With a proper choice of exploration rate, we prove E$^4$ achieves the finite-time minimax optimal regret…

Machine Learning · Computer Science 2024-06-07 Xuanfei Ren , Tianyuan Jin , Pan Xu

A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays

We propose a new best-of-both-worlds algorithm for bandits with variably delayed feedback. In contrast to prior work, which required prior knowledge of the maximal delay $d_{\mathrm{max}}$ and had a linear dependence of the regret on it,…

Machine Learning · Computer Science 2024-05-29 Saeed Masoudian , Julian Zimmert , Yevgeny Seldin