Related papers: Optimal anytime regret with two experts

No-Regret Linear Bandits beyond Realizability

We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter $\epsilon$ that measures the sup-norm error of the best linear approximation. This results in an…

Machine Learning · Computer Science 2023-07-21 Chong Liu , Ming Yin , Yu-Xiang Wang

Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits

We study the optimal batch-regret tradeoff for batch linear contextual bandits. For any batch number $M$, number of actions $K$, time horizon $T$, and dimension $d$, we provide an algorithm and prove its regret guarantee, which, due to…

Machine Learning · Computer Science 2022-10-18 Zihan Zhang , Xiangyang Ji , Yuan Zhou

Achieving All with No Parameters: Adaptive NormalHedge

We study the classic online learning problem of predicting with expert advice, and propose a truly parameter-free and adaptive algorithm that achieves several objectives simultaneously without using any prior information. The main component…

Machine Learning · Computer Science 2015-02-23 Haipeng Luo , Robert E. Schapire

Fully Unconstrained Online Learning

We provide an online learning algorithm that obtains regret $G\|w_\star\|\sqrt{T\log(\|w_\star\|G\sqrt{T})} + \|w_\star\|^2 + G^2$ on $G$-Lipschitz convex losses for any comparison point $w_\star$ without knowing either $G$ or…

Machine Learning · Computer Science 2024-06-03 Ashok Cutkosky , Zakaria Mhammedi

Making Non-Stochastic Control (Almost) as Easy as Stochastic

Recent literature has made much progress in understanding \emph{online LQR}: a modern learning-theoretic take on the classical control problem in which a learner attempts to optimally control an unknown linear dynamical system with fully…

Machine Learning · Computer Science 2020-10-06 Max Simchowitz

Distributed Non-Stochastic Experts

We consider the online distributed non-stochastic experts problem, where the distributed system consists of one coordinator node that is connected to $k$ sites, and the sites are required to communicate with each other via the coordinator.…

Machine Learning · Computer Science 2012-11-15 Varun Kanade , Zhenming Liu , Bozidar Radunovic

Adaptive Regret Minimization in Bounded-Memory Games

Online learning algorithms that minimize regret provide strong guarantees in situations that involve repeatedly making decisions in an uncertain environment, e.g. a driver deciding what route to drive to work every day. While regret…

Computer Science and Game Theory · Computer Science 2013-09-06 Jeremiah Blocki , Nicolas Christin , Anupam Datta , Arunesh Sinha

Adaptive Regret of Convex and Smooth Functions

We investigate online convex optimization in changing environments, and choose the adaptive regret as the performance measure. The goal is to achieve a small regret over every interval so that the comparator is allowed to change over time.…

Machine Learning · Computer Science 2019-06-18 Lijun Zhang , Tie-Yan Liu , Zhi-Hua Zhou

Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback

We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback, without prior knowledge on transitions or access to simulators. We introduce two algorithms that achieve improved regret…

Machine Learning · Computer Science 2023-10-19 Haolin Liu , Chen-Yu Wei , Julian Zimmert

Tight Regret Upper and Lower Bounds for Optimistic Hedge in Two-Player Zero-Sum Games

In two-player zero-sum games, the learning dynamic based on optimistic Hedge achieves one of the best-known regret upper bounds among strongly-uncoupled learning dynamics. With an appropriately chosen learning rate, the social and…

Machine Learning · Computer Science 2025-10-14 Taira Tsuchiya

Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems

We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon $T$ with fixed and known cost matrices $Q,R$, but unknown and non-stationary dynamics $\{A_t, B_t\}$. The sequence of dynamics matrices…

Machine Learning · Computer Science 2022-03-21 Yuwei Luo , Varun Gupta , Mladen Kolar

Dynamic Regret of Convex and Smooth Functions

We investigate online convex optimization in non-stationary environments and choose the dynamic regret as the performance measure, defined as the difference between cumulative loss incurred by the online algorithm and that of any feasible…

Machine Learning · Computer Science 2020-12-01 Peng Zhao , Yu-Jie Zhang , Lijun Zhang , Zhi-Hua Zhou

Equipping Experts/Bandits with Long-term Memory

We propose the first reduction-based approach to obtaining long-term memory guarantees for online learning in the sense of Bousquet and Warmuth, 2002, by reducing the problem to achieving typical switching regret. Specifically, for the…

Machine Learning · Computer Science 2019-10-29 Kai Zheng , Haipeng Luo , Ilias Diakonikolas , Liwei Wang

Regret Lower Bound and Optimal Algorithm for High-Dimensional Contextual Linear Bandit

In this paper, we consider the multi-armed bandit problem with high-dimensional features. First, we prove a minimax lower bound, $\mathcal{O}\big((\log d)^{\frac{\alpha+1}{2}}T^{\frac{1-\alpha}{2}}+\log T\big)$, for the cumulative regret,…

Machine Learning · Computer Science 2021-09-27 Ke Li , Yun Yang , Naveen N. Narisetty

Optimally Confident UCB: Improved Regret for Finite-Armed Bandits

I present the first algorithm for stochastic finite-armed bandits that simultaneously enjoys order-optimal problem-dependent regret and worst-case regret. Besides the theoretical results, the new algorithm is simple, efficient and…

Machine Learning · Computer Science 2016-02-25 Tor Lattimore

No-Regret Learning in Time-Varying Zero-Sum Games

Learning from repeated play in a fixed two-player zero-sum game is a classic problem in game theory and online learning. We consider a variant of this problem where the game payoff matrix changes over time, possibly in an adversarial…

Machine Learning · Computer Science 2022-02-01 Mengxiao Zhang , Peng Zhao , Haipeng Luo , Zhi-Hua Zhou

An Optimal Algorithm for Online Unconstrained Submodular Maximization

We consider a basic problem at the interface of two fundamental fields: submodular optimization and online learning. In the online unconstrained submodular maximization (online USM) problem, there is a universe $[n]=\{1,2,...,n\}$ and a…

Machine Learning · Computer Science 2018-06-12 Tim Roughgarden , Joshua R. Wang

Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits

We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret minimization algorithms that guarantee for…

Machine Learning · Computer Science 2019-11-19 Yogev Bar-On , Yishay Mansour

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

In this work, we develop linear bandit algorithms that automatically adapt to different environments. By plugging a novel loss estimator into the optimization problem that characterizes the instance-optimal strategy, our first algorithm not…

Machine Learning · Computer Science 2021-06-15 Chung-Wei Lee , Haipeng Luo , Chen-Yu Wei , Mengxiao Zhang , Xiaojin Zhang

Information-Theoretic Regret Bounds for Bandits with Fixed Expert Advice

We investigate the problem of bandits with expert advice when the experts are fixed and known distributions over the actions. Improving on previous analyses, we show that the regret in this setting is controlled by information-theoretic…

Machine Learning · Computer Science 2023-03-16 Khaled Eldowa , Nicolò Cesa-Bianchi , Alberto Maria Metelli , Marcello Restelli