English
Related papers

Related papers: Optimal anytime regret with two experts

200 papers

We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic paradigm and on…

Machine Learning · Computer Science 2013-04-02 Sebastien Bubeck , Damien Ernst , Aurelien Garivier

In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee $O(1)$ regret compared to a given comparator strategy, and $\tilde{O}(\sqrt{T})$ regret compared to any fixed…

Machine Learning · Computer Science 2025-06-05 Adrian Müller , Jon Schneider , Stratis Skoulakis , Luca Viano , Volkan Cevher

For decision making under uncertainty, min-max regret has been established as a popular methodology to find robust solutions. In this approach, we compare the performance of our solution against the best possible performance had we known…

Optimization and Control · Mathematics 2021-11-25 Marc Goerigk , Michael Hartisch

Optimising queries in real-world situations under imperfect conditions is still a problem that has not been fully solved. We consider finding the optimal order in which to execute a given set of selection operators under partial ignorance…

Databases · Computer Science 2015-07-30 Khaled H. Alyoubi , Sven Helmer , Peter T. Wood

We consider online learning problems in the realizable setting, where there is a zero-loss solution, and propose new Differentially Private (DP) algorithms that obtain near-optimal regret bounds. For the problem of online prediction from…

Machine Learning · Computer Science 2023-03-01 Hilal Asi , Vitaly Feldman , Tomer Koren , Kunal Talwar

Sequential learning with feedback graphs is a natural extension of the multi-armed bandit problem where the problem is equipped with an underlying graph structure that provides additional information - playing an action reveals the losses…

Machine Learning · Computer Science 2023-06-06 Tomáš Kocák , Alexandra Carpentier

We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of Kale (2014). The two bounds determine the minimax optimal…

Machine Learning · Computer Science 2025-11-04 Zachary Chase , Shinji Ito , Idan Mehalel

We study the dynamic regret of multi-armed bandit and experts problem in non-stationary stochastic environments. We introduce a new parameter $\Lambda$, which measures the total statistical variance of the loss distributions over $T$ rounds…

Machine Learning · Computer Science 2019-06-24 Chen-Yu Wei , Yi-Te Hong , Chi-Jen Lu

We present simple and efficient algorithms for the batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve over the best-known regret bounds for any number…

Data Structures and Algorithms · Computer Science 2020-02-19 Hossein Esfandiari , Amin Karbasi , Abbas Mehrabian , Vahab Mirrokni

This letter studies the problem of online multi-step-ahead prediction for unknown linear stochastic systems. Using conditional distribution theory, we derive an optimal parameterization of the prediction policy as a linear function of…

Machine Learning · Computer Science 2025-11-18 Jiachen Qian , Yang Zheng

We consider online learning problems where the aim is to achieve regret which is efficient in the sense that it is the same order as the lowest regret amongst K experts. This is a substantially stronger requirement that achieving…

Machine Learning · Computer Science 2019-11-12 Daron Anderson , Douglas J. Leith

We introduce algorithms for online, full-information prediction that are competitive with contextual tree experts of unknown complexity, in both probabilistic and adversarial settings. We show that by incorporating a probabilistic framework…

Machine Learning · Computer Science 2018-05-23 Vidya Muthukumar , Mitas Ray , Anant Sahai , Peter L. Bartlett

We consider model selection in stochastic bandit and reinforcement learning problems. Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion. We show that by…

Machine Learning · Computer Science 2020-06-11 Yasin Abbasi-Yadkori , Aldo Pacchiano , My Phan

We study the problem of determining an effective exploration strategy in static and non-linear optimization problems, which depend on an unknown scalar parameter to be learned from online collected noisy data. An optimal trade-off between…

Optimization and Control · Mathematics 2024-09-13 Ying Wang , Mirko Pasquini , Kévin Colin , Håkan Hjalmarsson

We revisit the problem of stochastic online learning with feedback graphs, with the goal of devising algorithms that are optimal, up to constants, both asymptotically and in finite time. We show that, surprisingly, the notion of optimal…

Machine Learning · Computer Science 2022-06-22 Teodor V. Marinov , Mehryar Mohri , Julian Zimmert

Past research on interactive decision making problems (bandits, reinforcement learning, etc.) mostly focuses on the minimax regret that measures the algorithm's performance on the hardest instance. However, an ideal algorithm should adapt…

Machine Learning · Computer Science 2023-06-13 Kefan Dong , Tengyu Ma

In the framework of prediction of individual sequences, sequential prediction methods are to be constructed that perform nearly as well as the best expert from a given class. We consider prediction strategies that compete with the class of…

Machine Learning · Computer Science 2012-07-12 András Gyorgy , Tamás Linder , Gábor Lugosi

Some of the most compelling applications of online convex optimization, including online prediction and classification, are unconstrained: the natural feasible set is R^n. Existing algorithms fail to achieve sub-linear regret in this…

Machine Learning · Computer Science 2012-11-13 Matthew Streeter , H. Brendan McMahan

We provide consistent random algorithms for sequential decision under partial monitoring, i.e. when the decision maker does not observe the outcomes but receives instead random feedback signals. Those algorithms have no internal regret in…

Machine Learning · Computer Science 2011-02-23 Vianney Perchet

When dealing with time series with complex non-stationarities, low retrospective regret on individual realizations is a more appropriate goal than low prospective risk in expectation. Online learning algorithms provide powerful guarantees…

Machine Learning · Statistics 2011-06-30 Cosma Rohilla Shalizi , Abigail Z. Jacobs , Kristina Lisa Klinkner , Aaron Clauset