English
Related papers

Related papers: Optimal anytime regret with two experts

200 papers

We study linear bandits when the underlying reward function is not linear. Existing work relies on a uniform misspecification parameter $\epsilon$ that measures the sup-norm error of the best linear approximation. This results in an…

Machine Learning · Computer Science 2023-07-21 Chong Liu , Ming Yin , Yu-Xiang Wang

We study the optimal batch-regret tradeoff for batch linear contextual bandits. For any batch number $M$, number of actions $K$, time horizon $T$, and dimension $d$, we provide an algorithm and prove its regret guarantee, which, due to…

Machine Learning · Computer Science 2022-10-18 Zihan Zhang , Xiangyang Ji , Yuan Zhou

We study the classic online learning problem of predicting with expert advice, and propose a truly parameter-free and adaptive algorithm that achieves several objectives simultaneously without using any prior information. The main component…

Machine Learning · Computer Science 2015-02-23 Haipeng Luo , Robert E. Schapire

We provide an online learning algorithm that obtains regret $G\|w_\star\|\sqrt{T\log(\|w_\star\|G\sqrt{T})} + \|w_\star\|^2 + G^2$ on $G$-Lipschitz convex losses for any comparison point $w_\star$ without knowing either $G$ or…

Machine Learning · Computer Science 2024-06-03 Ashok Cutkosky , Zakaria Mhammedi

Recent literature has made much progress in understanding \emph{online LQR}: a modern learning-theoretic take on the classical control problem in which a learner attempts to optimally control an unknown linear dynamical system with fully…

Machine Learning · Computer Science 2020-10-06 Max Simchowitz

We consider the online distributed non-stochastic experts problem, where the distributed system consists of one coordinator node that is connected to $k$ sites, and the sites are required to communicate with each other via the coordinator.…

Machine Learning · Computer Science 2012-11-15 Varun Kanade , Zhenming Liu , Bozidar Radunovic

Online learning algorithms that minimize regret provide strong guarantees in situations that involve repeatedly making decisions in an uncertain environment, e.g. a driver deciding what route to drive to work every day. While regret…

Computer Science and Game Theory · Computer Science 2013-09-06 Jeremiah Blocki , Nicolas Christin , Anupam Datta , Arunesh Sinha

We investigate online convex optimization in changing environments, and choose the adaptive regret as the performance measure. The goal is to achieve a small regret over every interval so that the comparator is allowed to change over time.…

Machine Learning · Computer Science 2019-06-18 Lijun Zhang , Tie-Yan Liu , Zhi-Hua Zhou

We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback, without prior knowledge on transitions or access to simulators. We introduce two algorithms that achieve improved regret…

Machine Learning · Computer Science 2023-10-19 Haolin Liu , Chen-Yu Wei , Julian Zimmert

In two-player zero-sum games, the learning dynamic based on optimistic Hedge achieves one of the best-known regret upper bounds among strongly-uncoupled learning dynamics. With an appropriately chosen learning rate, the social and…

Machine Learning · Computer Science 2025-10-14 Taira Tsuchiya

We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon $T$ with fixed and known cost matrices $Q,R$, but unknown and non-stationary dynamics $\{A_t, B_t\}$. The sequence of dynamics matrices…

Machine Learning · Computer Science 2022-03-21 Yuwei Luo , Varun Gupta , Mladen Kolar

We investigate online convex optimization in non-stationary environments and choose the dynamic regret as the performance measure, defined as the difference between cumulative loss incurred by the online algorithm and that of any feasible…

Machine Learning · Computer Science 2020-12-01 Peng Zhao , Yu-Jie Zhang , Lijun Zhang , Zhi-Hua Zhou

We propose the first reduction-based approach to obtaining long-term memory guarantees for online learning in the sense of Bousquet and Warmuth, 2002, by reducing the problem to achieving typical switching regret. Specifically, for the…

Machine Learning · Computer Science 2019-10-29 Kai Zheng , Haipeng Luo , Ilias Diakonikolas , Liwei Wang

In this paper, we consider the multi-armed bandit problem with high-dimensional features. First, we prove a minimax lower bound, $\mathcal{O}\big((\log d)^{\frac{\alpha+1}{2}}T^{\frac{1-\alpha}{2}}+\log T\big)$, for the cumulative regret,…

Machine Learning · Computer Science 2021-09-27 Ke Li , Yun Yang , Naveen N. Narisetty

I present the first algorithm for stochastic finite-armed bandits that simultaneously enjoys order-optimal problem-dependent regret and worst-case regret. Besides the theoretical results, the new algorithm is simple, efficient and…

Machine Learning · Computer Science 2016-02-25 Tor Lattimore

Learning from repeated play in a fixed two-player zero-sum game is a classic problem in game theory and online learning. We consider a variant of this problem where the game payoff matrix changes over time, possibly in an adversarial…

Machine Learning · Computer Science 2022-02-01 Mengxiao Zhang , Peng Zhao , Haipeng Luo , Zhi-Hua Zhou

We consider a basic problem at the interface of two fundamental fields: submodular optimization and online learning. In the online unconstrained submodular maximization (online USM) problem, there is a universe $[n]=\{1,2,...,n\}$ and a…

Machine Learning · Computer Science 2018-06-12 Tim Roughgarden , Joshua R. Wang

We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic multi-armed bandit problem. We derive regret minimization algorithms that guarantee for…

Machine Learning · Computer Science 2019-11-19 Yogev Bar-On , Yishay Mansour

In this work, we develop linear bandit algorithms that automatically adapt to different environments. By plugging a novel loss estimator into the optimization problem that characterizes the instance-optimal strategy, our first algorithm not…

Machine Learning · Computer Science 2021-06-15 Chung-Wei Lee , Haipeng Luo , Chen-Yu Wei , Mengxiao Zhang , Xiaojin Zhang

We investigate the problem of bandits with expert advice when the experts are fixed and known distributions over the actions. Improving on previous analyses, we show that the regret in this setting is controlled by information-theoretic…

Machine Learning · Computer Science 2023-03-16 Khaled Eldowa , Nicolò Cesa-Bianchi , Alberto Maria Metelli , Marcello Restelli