English
Related papers

Related papers: Optimal anytime regret with two experts

200 papers

Algorithms with predictions is a recent framework that has been used to overcome pessimistic worst-case bounds in incomplete information settings. In the context of scheduling, very recent work has leveraged machine-learned predictions to…

Data Structures and Algorithms · Computer Science 2022-12-08 Eric Balkanski , Tingting Ou , Clifford Stein , Hao-Ting Wei

We propose an anytime online algorithm for the problem of learning a sequence of adversarial convex cost functions while approximately satisfying another sequence of adversarial online convex constraints. A sequential algorithm is called…

Machine Learning · Computer Science 2025-10-28 Dhruv Sarkar , Abhishek Sinha

We consider online convex optimization with a zero-order oracle feedback. In particular, the decision maker does not know the explicit representation of the time-varying cost functions, or their gradients. At each time step, she observes…

Optimization and Control · Mathematics 2020-05-05 Tatiana Tatarenko , Maryam Kamgarpour

Towards bridging classical optimal control and online learning, regret minimization has recently been proposed as a control design criterion. This competitive paradigm penalizes the loss relative to the optimal control actions chosen by a…

Systems and Control · Electrical Eng. & Systems 2023-06-27 Andrea Martin , Luca Furieri , Florian Dörfler , John Lygeros , Giancarlo Ferrari-Trecate

We study online convex optimization under stochastic sub-gradient observation faults, where we introduce adaptive algorithms with minimax optimal regret guarantees. We specifically study scenarios where our sub-gradient observations can be…

Machine Learning · Computer Science 2019-04-23 Hakan Gokcesu , Suleyman S. Kozat

In online learning, the dynamic regret metric chooses the reference (optimal) solution that may change over time, while the typical (static) regret metric assumes the reference solution to be constant over the whole time horizon. The…

Machine Learning · Computer Science 2019-09-04 Yawei Zhao , Shuang Qiu , Ji Liu

We consider a stochastic multi-armed bandit setting and study the problem of constrained regret minimization over a given time horizon. Each arm is associated with an unknown, possibly multi-dimensional distribution, and the merit of an arm…

Machine Learning · Computer Science 2023-01-05 Anmol Kagrecha , Jayakrishnan Nair , Krishna Jagannathan

Dealing with uncertainty in optimization parameters is an important and longstanding challenge. Typically, uncertain parameters are predicted accurately, and then a deterministic optimization problem is solved. However, the decisions…

Machine Learning · Computer Science 2025-08-11 Víctor Bucarey , Sophia Calderón , Gonzalo Muñoz , Frederic Semet

We study best-of-both-worlds algorithms for bandits with switching cost, recently addressed by Rouyer, Seldin and Cesa-Bianchi, 2021. We introduce a surprisingly simple and effective algorithm that simultaneously achieves minimax optimal…

Machine Learning · Computer Science 2022-11-03 Idan Amir , Guy Azov , Tomer Koren , Roi Livni

We consider a setting where a system learns to rank a fixed set of $m$ items. The goal is produce good item rankings for users with diverse interests who interact online with the system for $T$ rounds. We consider a novel top-$1$ feedback…

Machine Learning · Computer Science 2016-08-24 Sougata Chaudhuri , Ambuj Tewari

Regret matching (RM) -- and its modern variants -- is a foundational online algorithm that has been at the heart of many AI breakthrough results in solving benchmark zero-sum games, such as poker. Yet, surprisingly little is known so far in…

Computer Science and Game Theory · Computer Science 2025-11-18 Ioannis Anagnostides , Emanuel Tewolde , Brian Hu Zhang , Ioannis Panageas , Vincent Conitzer , Tuomas Sandholm

We consider the problem of using observational bandit feedback data from multiple heterogeneous data sources to learn a personalized decision policy that robustly generalizes across diverse target settings. To achieve this, we propose a…

Machine Learning · Computer Science 2024-10-14 Aldo Gael Carranza , Susan Athey

We show how to take any two parameter-free online learning algorithms with different regret guarantees and obtain a single algorithm whose regret is the minimum of the two base algorithms. Our method is embarrassingly simple: just add the…

Machine Learning · Statistics 2019-02-26 Ashok Cutkosky

We study the problem of minimising regret in two-armed bandit problems with Gaussian rewards. Our objective is to use this simple setting to illustrate that strategies based on an exploration phase (up to a stopping time) followed by…

Statistics Theory · Mathematics 2016-11-15 Aurélien Garivier , Emilie Kaufmann , Tor Lattimore

In recent years, significant attention has been directed towards learning average-reward Markov Decision Processes (MDPs). However, existing algorithms either suffer from sub-optimal regret guarantees or computational inefficiencies. In…

Machine Learning · Computer Science 2024-06-04 Victor Boone , Zihan Zhang

Multi-armed bandit (MAB) algorithms have achieved significant success in sequential decision-making applications, under the premise that humans perfectly implement the recommended policy. However, existing methods often overlook the crucial…

Machine Learning · Statistics 2024-10-07 Changxiao Cai , Jiacheng Zhang

We consider the classical stochastic multi-armed bandit problem with a constraint that limits the total cost incurred by switching between actions to be no larger than a given switching budget. For this problem, we prove matching upper and…

Machine Learning · Computer Science 2021-03-22 David Simchi-Levi , Yunzong Xu

We investigate the problem of continuous-time causal estimation under a minimax criterion. Let $X^T = \{X_t,0\leq t\leq T\}$ be governed by the probability law $P_{\theta}$ from a class of possible laws indexed by $\theta \in \Lambda$, and…

Information Theory · Computer Science 2014-07-09 Albert No , Tsachy Weissman

Finding numerical approximations to minimax regret treatment rules is of key interest. To do so when potential outcomes are in {0,1} we discretize the action space of nature and apply a variant of Robinson's (1951) algorithm for iterative…

Econometrics · Economics 2025-03-17 Patrik Guggenberger , Jiaqi Huang

We consider the nonstochastic multi-agent multi-armed bandit problem with agents collaborating via a communication network with delays. We show a lower bound for individual regret of all agents. We show that with suitable regularizers and…

Machine Learning · Statistics 2023-10-24 Jialin Yi , Milan Vojnović