Related papers: Optimal anytime regret with two experts

Quick Best Action Identification in Linear Bandit Problems

In this paper, we consider a best action identification problem in the stochastic linear bandit setup with a fixed confident constraint. In the considered best action identification problem, instead of minimizing the accumulative regret as…

Machine Learning · Computer Science 2018-12-04 Jun Geng , Lifeng Lai

Learning to Route Efficiently with End-to-End Feedback: The Value of Networked Structure

We introduce efficient algorithms which achieve nearly optimal regrets for the problem of stochastic online shortest path routing with end-to-end feedback. The setting is a natural application of the combinatorial stochastic bandits…

Machine Learning · Computer Science 2018-12-20 Ruihao Zhu , Eytan Modiano

Regret-Optimal Filtering for Prediction and Estimation

The filtering problem of causally estimating a desired signal from a related observation signal is investigated through the lens of regret optimization. Classical filter designs, such as $\mathcal H_2$ (Kalman) and $\mathcal H_\infty$,…

Optimization and Control · Mathematics 2022-11-23 Oron Sabag , Babak Hassibi

Towards minimax policies for online linear optimization with bandit feedback

We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order $\sqrt{d n \log N}$ for any finite action set with $N$…

Machine Learning · Computer Science 2012-02-15 Sébastien Bubeck , Nicolò Cesa-Bianchi , Sham M. Kakade

Adaptive Regret for Bandits Made Possible: Two Queries Suffice

Fast changing states or volatile environments pose a significant challenge to online optimization, which needs to perform rapid adaptation under limited observation. In this paper, we give query and regret optimal bandit algorithms under…

Machine Learning · Computer Science 2024-01-18 Zhou Lu , Qiuyi Zhang , Xinyi Chen , Fred Zhang , David Woodruff , Elad Hazan

Analysis of Thompson Sampling for the multi-armed bandit problem

The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in sequential decision problems. Many algorithms are now available for this well-studied problem. One of the earliest algorithms, given by W.…

Machine Learning · Computer Science 2012-04-10 Shipra Agrawal , Navin Goyal

Dynamic Regret Bounds for Online Omniprediction with Long Term Constraints

We present an algorithm guaranteeing dynamic regret bounds for online omniprediction with long term constraints. The goal in this recently introduced problem is for a learner to generate a sequence of predictions which are broadcast to a…

Machine Learning · Computer Science 2025-10-09 Yahav Bechavod , Jiuyao Lu , Aaron Roth

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then generalise the information-theoretic…

Machine Learning · Computer Science 2019-05-30 Tor Lattimore , Csaba Szepesvari

Adaptivity and Non-stationarity: Problem-dependent Dynamic Regret for Online Convex Optimization

We investigate online convex optimization in non-stationary environments and choose dynamic regret as the performance measure, defined as the difference between cumulative loss incurred by the online algorithm and that of any feasible…

Machine Learning · Computer Science 2024-04-09 Peng Zhao , Yu-Jie Zhang , Lijun Zhang , Zhi-Hua Zhou

A conversion theorem and minimax optimality for continuum contextual bandits

We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated with the context. The goal is to minimize all…

Machine Learning · Statistics 2025-10-28 Arya Akhavan , Karim Lounici , Massimiliano Pontil , Alexandre B. Tsybakov

Convergence Analysis of Optimization Algorithms

The regret bound of an optimization algorithms is one of the basic criteria for evaluating the performance of the given algorithm. By inspecting the differences between the regret bounds of traditional algorithms and adaptive one, we…

Machine Learning · Statistics 2017-07-07 HyoungSeok Kim , JiHoon Kang , WooMyoung Park , SukHyun Ko , YoonHo Cho , DaeSung Yu , YoungSook Song , JungWon Choi

Logarithmic Regret in Adaptive Control of Noisy Linear Quadratic Regulator Systems Using Hints

The problem of regret minimization for online adaptive control of linear-quadratic systems is studied. In this problem, the true system transition parameters (matrices $A$ and $B$) are unknown, and the objective is to design and analyze…

Optimization and Control · Mathematics 2022-10-31 Mohammad Akbari , Bahman Gharesifard , Tamas Linder

Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces

Designing efficient general-purpose contextual bandit algorithms that work with large -- or even continuous -- action spaces would facilitate application to important scenarios such as information retrieval, recommendation systems, and…

Machine Learning · Computer Science 2022-07-14 Yinglun Zhu , Paul Mineiro

Low Regret Binary Sampling Method for Efficient Global Optimization of Univariate Functions

In this work, we propose a computationally efficient algorithm for the problem of global optimization in univariate loss functions. For the performance evaluation, we study the cumulative regret of the algorithm instead of the simple regret…

Machine Learning · Computer Science 2022-01-19 Kaan Gokcesu , Hakan Gokcesu

Empirical Bayes Regret Minimization

Most bandit algorithm designs are purely theoretical. Therefore, they have strong regret guarantees, but also are often too conservative in practice. In this work, we pioneer the idea of algorithm design by minimizing the empirical Bayes…

Machine Learning · Computer Science 2020-06-12 Chih-Wei Hsu , Branislav Kveton , Ofer Meshi , Martin Mladenov , Csaba Szepesvari

Joint Stabilization and Regret Minimization through Switching in Over-Actuated Systems (extended version)

Adaptively controlling and minimizing regret in unknown dynamical systems while controlling the growth of the system state is crucial in real-world applications. In this work, we study the problem of stabilization and regret minimization of…

Systems and Control · Electrical Eng. & Systems 2022-02-10 Jafar Abbaszadeh Chekan , Kamyar Azizzadenesheli , Cedric Langbort

Private Online Prediction from Experts: Separations and Faster Rates

Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of…

Machine Learning · Computer Science 2023-07-03 Hilal Asi , Vitaly Feldman , Tomer Koren , Kunal Talwar

Randomized Minmax Regret for Combinatorial Optimization Under Uncertainty

The minmax regret problem for combinatorial optimization under uncertainty can be viewed as a zero-sum game played between an optimizing player and an adversary, where the optimizing player selects a solution and the adversary selects costs…

Discrete Mathematics · Computer Science 2014-09-23 Andrew Mastin , Patrick Jaillet , Sang Chin

The Nonstochastic Control Problem

We consider the problem of controlling an unknown linear dynamical system in the presence of (nonstochastic) adversarial perturbations and adversarial convex loss functions. In contrast to classical control, the a priori determination of an…

Machine Learning · Computer Science 2020-01-22 Elad Hazan , Sham M. Kakade , Karan Singh

Incentive-compatible Bandits: Importance Weighting No More

We study the problem of incentive-compatible online learning with bandit feedback. In this class of problems, the experts are self-interested agents who might misrepresent their preferences with the goal of being selected most often. The…

Machine Learning · Computer Science 2024-05-13 Julian Zimmert , Teodor V. Marinov