Related papers: An efficient algorithm for learning with semi-band…

Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits

We propose a sample-efficient alternative for importance weighting for situations where one only has sample access to the probability distribution that generates the observations. Our new method, called Geometric Resampling (GR), is…

Machine Learning · Computer Science 2016-09-02 Gergely Neu , Gábor Bartók

First-order regret bounds for combinatorial semi-bandits

We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions.…

Machine Learning · Computer Science 2015-06-11 Gergely Neu

An Efficient Algorithm for Cooperative Semi-Bandits

We consider the problem of asynchronous online combinatorial optimization on a network of communicating agents. At each time step, some of the agents are stochastically activated, requested to make a prediction, and the system pays the…

Machine Learning · Computer Science 2021-02-10 Riccardo Della Vecchia , Tommaso Cesari

A Further Efficient Algorithm with Best-of-Both-Worlds Guarantees for $m$-Set Semi-Bandit Problem

This paper studies the optimality and complexity of Follow-the-Perturbed-Leader (FTPL) policy in $m$-set semi-bandit problems. FTPL has been studied extensively as a promising candidate of an efficient algorithm with favorable regret for…

Machine Learning · Computer Science 2026-03-13 Botao Chen , Jongyeong Lee , Chansoo Kim , Junya Honda

Combinatorial Bandits with Relative Feedback

We consider combinatorial online learning with subset choices when only relative feedback information from subsets is available, instead of bandit or semi-bandit feedback which is absolute. Specifically, we study two regret minimisation…

Machine Learning · Computer Science 2020-02-28 Aadirupa Saha , Aditya Gopalan

Regret in Online Combinatorial Optimization

We address online linear optimization problems when the possible actions of the decision maker are represented by binary vectors. The regret of the decision maker is the difference between her realized loss and the best loss she would have…

Machine Learning · Computer Science 2013-04-02 Jean-Yves Audibert , Sébastien Bubeck , Gábor Lugosi

Note on Follow-the-Perturbed-Leader in Combinatorial Semi-Bandit Problems

This paper studies the optimality and complexity of Follow-the-Perturbed-Leader (FTPL) policy in size-invariant combinatorial semi-bandit problems. Recently, Honda et al. (2023) and Lee et al. (2024) showed that FTPL achieves…

Machine Learning · Computer Science 2025-07-23 Botao Chen , Junya Honda

Follow-the-Perturbed-Leader Approaches Best-of-Both-Worlds for the m-Set Semi-Bandit Problems

We consider a common case of the combinatorial semi-bandit problem, the $m$-set semi-bandit, where the learner exactly selects $m$ arms from the total $d$ arms. In the adversarial setting, the best regret bound, known to be…

Machine Learning · Computer Science 2025-07-08 Jingxin Zhan , Yuchen Xin , Chenjie Sun , Zhihua Zhang

A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in…

Machine Learning · Computer Science 2023-05-16 Dirk van der Hoeven , Lukas Zierahn , Tal Lancewicki , Aviv Rosenberg , Nicoló Cesa-Bianchi

Adversarial Combinatorial Semi-bandits with Graph Feedback

In combinatorial semi-bandits, a learner repeatedly selects from a combinatorial decision set of arms, receives the realized sum of rewards, and observes the rewards of the individual selected arms as feedback. In this paper, we extend this…

Machine Learning · Computer Science 2025-09-17 Yuxiao Wen

Online Non-Convex Learning: Following the Perturbed Leader is Optimal

We study the problem of online learning with non-convex losses, where the learner has access to an offline optimization oracle. We show that the classical Follow the Perturbed Leader (FTPL) algorithm achieves optimal regret rate of…

Machine Learning · Computer Science 2019-09-24 Arun Sai Suggala , Praneeth Netrapalli

Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as…

Machine Learning · Computer Science 2017-06-08 Branislav Kveton , Zheng Wen , Azin Ashkan , Csaba Szepesvari

Bandit Principal Component Analysis

We consider a partial-feedback variant of the well-studied online PCA problem where a learner attempts to predict a sequence of $d$-dimensional vectors in terms of a quadratic loss, while only having limited feedback about the environment's…

Machine Learning · Computer Science 2019-02-11 Wojciech Kotłowski , Gergely Neu

An Optimal Algorithm for Linear Bandits

We provide the first algorithm for online bandit linear optimization whose regret after T rounds is of order sqrt{Td ln N} on any finite class X of N actions in d dimensions, and of order d*sqrt{T} (up to log factors) when X is infinite.…

Machine Learning · Computer Science 2012-02-15 Nicolò Cesa-Bianchi , Sham Kakade

Experimental Design for Regret Minimization in Linear Bandits

In this paper we propose a novel experimental design-based algorithm to minimize regret in online stochastic linear and combinatorial bandits. While existing literature tends to focus on optimism-based algorithms--which have been shown to…

Machine Learning · Computer Science 2021-03-02 Andrew Wagenmaker , Julian Katz-Samuels , Kevin Jamieson

Offline Local Search for Online Stochastic Bandits

Combinatorial multi-armed bandits provide a fundamental online decision-making environment where a decision-maker interacts with an environment across $T$ time steps, each time selecting an action and learning the cost of that action. The…

Machine Learning · Computer Science 2026-04-13 Gerdus Benadè , Rathish Das , Thomas Lavastida

Online combinatorial optimization with stochastic decision sets and adversarial losses

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can…

Machine Learning · Computer Science 2026-04-29 Gergely Neu , Michal Valko

Online Learning of Strategic Defense against Ecological Adversaries under Partial Observability with Semi-Bandit Feedback

We introduce an online learning algorithm for computing adaptive resource allocation policies against strategic ecological adversaries with unknown behavioral models and partial observability. Our setting addresses a fundamental limitation…

Computational Engineering, Finance, and Science · Computer Science 2026-03-13 Anjali Purathekandy , Deepak N. Subramani

Efficient and Optimal No-Regret Caching under Partial Observation

Online learning algorithms have been successfully used to design caching policies with sublinear regret in the total number of requests, with no statistical assumption about the request sequence. Most existing algorithms involve…

Machine Learning · Computer Science 2025-03-05 Younes Ben Mazziane , Francescomaria Faticanti , Sara Alouf , Giovanni Neglia

Efficient learning by implicit exploration in bandit problems with side observations

We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition…

Machine Learning · Computer Science 2026-04-28 Tomas Kocak , Gergely Neu , Michal Valko , Remi Munos