Related papers: Online Learning with Imperfect Hints

Logarithmic Regret from Sublinear Hints

We consider the online linear optimization problem, where at every step the algorithm plays a point $x_t$ in the unit ball, and suffers loss $\langle c_t, x_t\rangle$ for some cost vector $c_t$ that is then revealed to the algorithm. Recent…

Machine Learning · Computer Science 2021-11-10 Aditya Bhaskara , Ashok Cutkosky , Ravi Kumar , Manish Purohit

Online Linear Optimization with Many Hints

We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ "hint" vectors in each round prior to making a decision. In this setting, we devise an algorithm that obtains logarithmic regret whenever…

Machine Learning · Computer Science 2020-10-08 Aditya Bhaskara , Ashok Cutkosky , Ravi Kumar , Manish Purohit

Online Learning with Unknown Constraints

We consider the problem of online learning where the sequence of actions played by the learner must adhere to an unknown safety constraint at every round. The goal is to minimize regret with respect to the best safe action in hindsight…

Machine Learning · Computer Science 2024-03-08 Karthik Sridharan , Seung Won Wilson Yoo

A note on continuous-time online learning

In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets. This note is concerned with continuous-time models and algorithms for several online…

Machine Learning · Statistics 2024-05-20 Lexing Ying

Online Optimization : Competing with Dynamic Comparators

Recent literature on online learning has focused on developing adaptive algorithms that take advantage of a regularity of the sequence of observations, yet retain worst-case performance guarantees. A complementary direction is to develop…

Machine Learning · Computer Science 2015-01-27 Ali Jadbabaie , Alexander Rakhlin , Shahin Shahrampour , Karthik Sridharan

A continuous-time approach to online optimization

We consider a family of learning strategies for online optimization problems that evolve in continuous time and we show that they lead to no regret. From a more traditional, discrete-time viewpoint, this continuous-time approach allows us…

Optimization and Control · Mathematics 2014-02-28 Joon Kwon , Panayotis Mertikopoulos

No-Regret Learnability for Piecewise Linear Losses

In the convex optimization approach to online regret minimization, many methods have been developed to guarantee a $O(\sqrt{T})$ bound on regret for subdifferentiable convex loss functions with bounded subgradients, by using a reduction to…

Machine Learning · Computer Science 2016-09-20 Arthur Flajolet , Patrick Jaillet

Temporal Variability in Implicit Online Learning

In the setting of online learning, Implicit algorithms turn out to be highly successful from a practical standpoint. However, the tightest regret analyses only show marginal improvements over Online Mirror Descent. In this work, we shed…

Machine Learning · Computer Science 2020-11-10 Nicolò Campolongo , Francesco Orabona

Online Learning with Off-Policy Feedback

We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead…

Machine Learning · Computer Science 2022-07-20 Germano Gabbianelli , Matteo Papini , Gergely Neu

Online Learning and Bandits with Queried Hints

We consider the classic online learning and stochastic multi-armed bandit (MAB) problems, when at each step, the online policy can probe and find out which of a small number ($k$) of choices has better reward (or loss) before making its…

Data Structures and Algorithms · Computer Science 2022-11-08 Aditya Bhaskara , Sreenivas Gollapudi , Sungjin Im , Kostas Kollias , Kamesh Munagala

No-Regret and Incentive-Compatible Online Learning

We study online learning settings in which experts act strategically to maximize their influence on the learning algorithm's predictions by potentially misreporting their beliefs about a sequence of binary events. Our goal is twofold.…

Machine Learning · Computer Science 2020-07-02 Rupert Freeman , David M. Pennock , Chara Podimata , Jennifer Wortman Vaughan

Online estimation and control with optimal pathlength regret

A natural goal when designing online learning algorithms for non-stationary environments is to bound the regret of the algorithm in terms of the temporal variation of the input sequence. Intuitively, when the variation is small, it should…

Machine Learning · Computer Science 2021-12-08 Gautam Goel , Babak Hassibi

A closer look at temporal variability in dynamic online learning

This work focuses on the setting of dynamic regret in the context of online learning with full information. In particular, we analyze regret bounds with respect to the temporal variability of the loss functions. By assuming that the…

Machine Learning · Computer Science 2021-02-16 Nicolò Campolongo , Francesco Orabona

Online Learning with Bounded Recall

We study the problem of full-information online learning in the "bounded recall" setting popular in the study of repeated games. An online learning algorithm $\mathcal{A}$ is $M$-$\textit{bounded-recall}$ if its output at time $t$ can be…

Machine Learning · Computer Science 2024-06-04 Jon Schneider , Kiran Vodrahalli

Leveraging Initial Hints for Free in Stochastic Linear Bandits

We study the setting of optimizing with bandit feedback with additional prior knowledge provided to the learner in the form of an initial hint of the optimal action. We present a novel algorithm for stochastic linear bandits that uses this…

Machine Learning · Computer Science 2022-03-09 Ashok Cutkosky , Chris Dann , Abhimanyu Das , Qiuyi , Zhang

Efficient Optimal Learning for Contextual Bandits

We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal…

Machine Learning · Computer Science 2011-06-17 Miroslav Dudik , Daniel Hsu , Satyen Kale , Nikos Karampatziakis , John Langford , Lev Reyzin , Tong Zhang

Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations

We study algorithms for online linear optimization in Hilbert spaces, focusing on the case where the player is unconstrained. We develop a novel characterization of a large class of minimax algorithms, recovering, and even improving,…

Machine Learning · Computer Science 2014-05-22 H. Brendan McMahan , Francesco Orabona

Near Optimal Memory-Regret Tradeoff for Online Learning

In the experts problem, on each of $T$ days, an agent needs to follow the advice of one of $n$ ``experts''. After each day, the loss associated with each expert's advice is revealed. A fundamental result in learning theory says that the…

Data Structures and Algorithms · Computer Science 2023-03-10 Binghui Peng , Aviad Rubinstein

Regret in Online Combinatorial Optimization

We address online linear optimization problems when the possible actions of the decision maker are represented by binary vectors. The regret of the decision maker is the difference between her realized loss and the best loss she would have…

Machine Learning · Computer Science 2013-04-02 Jean-Yves Audibert , Sébastien Bubeck , Gábor Lugosi

First-order regret bounds for combinatorial semi-bandits

We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions.…

Machine Learning · Computer Science 2015-06-11 Gergely Neu