Related papers: Quantum Algorithm for Online Exp-concave Optimizat…

Quantum Algorithm for Online Convex Optimization

We explore whether quantum advantages can be found for the zeroth-order online convex optimization problem, which is also known as bandit convex optimization with multi-point feedback. In this setting, given access to zeroth-order oracles…

Quantum Physics · Physics 2022-04-04 Jianhao He , Feidiao Yang , Jialin Zhang , Lvzhou Li

Minimizing Regret of Bandit Online Optimization in Unconstrained Action Spaces

We consider online convex optimization with a zero-order oracle feedback. In particular, the decision maker does not know the explicit representation of the time-varying cost functions, or their gradients. At each time step, she observes…

Optimization and Control · Mathematics 2020-05-05 Tatiana Tatarenko , Maryam Kamgarpour

Second Order Methods for Bandit Optimization and Control

Bandit convex optimization (BCO) is a general framework for online decision making under uncertainty. While tight regret bounds for general convex losses have been established, existing algorithms achieving these bounds have prohibitive…

Machine Learning · Computer Science 2024-10-04 Arun Suggala , Y. Jennifer Sun , Praneeth Netrapalli , Elad Hazan

An optimal algorithm for bandit convex optimization

We consider the problem of online convex optimization against an arbitrary adversary with bandit feedback, known as bandit convex optimization. We give the first $\tilde{O}(\sqrt{T})$-regret algorithm for this setting based on a novel…

Machine Learning · Computer Science 2016-03-16 Elad Hazan , Yuanzhi Li

Incentive-compatible Bandits: Importance Weighting No More

We study the problem of incentive-compatible online learning with bandit feedback. In this class of problems, the experts are self-interested agents who might misrepresent their preferences with the goal of being selected most often. The…

Machine Learning · Computer Science 2024-05-13 Julian Zimmert , Teodor V. Marinov

Online Continuous Submodular Maximization

In this paper, we consider an online optimization process, where the objective functions are not convex (nor concave) but instead belong to a broad class of continuous submodular functions. We first propose a variant of the Frank-Wolfe…

Machine Learning · Statistics 2018-02-19 Lin Chen , Hamed Hassani , Amin Karbasi

Exploiting Curvature in Online Convex Optimization with Delayed Feedback

In this work, we study the online convex optimization problem with curved losses and delayed feedback. When losses are strongly convex, existing approaches obtain regret bounds of order $d_{\max} \ln T$, where $d_{\max}$ is the maximum…

Machine Learning · Computer Science 2025-06-10 Hao Qiu , Emmanuel Esposito , Mengxiao Zhang

A Reduction from Delayed to Immediate Feedback for Online Convex Optimization with Improved Guarantees

We develop a reduction-based framework for online learning with delayed feedback that recovers and improves upon existing results for both first-order and bandit convex optimization. Our approach introduces a continuous-time model under…

Machine Learning · Computer Science 2026-02-04 Alexander Ryabchenko , Idan Attias , Daniel M. Roy

Online Stochastic Linear Optimization under One-bit Feedback

In this paper, we study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round. This problem has found many applications including online advertisement…

Machine Learning · Computer Science 2015-09-28 Lijun Zhang , Tianbao Yang , Rong Jin , Zhi-Hua Zhou

Optimal Regret Algorithm for Pseudo-1d Bandit Convex Optimization

We study online learning with bandit feedback (i.e. learner has access to only zeroth-order oracle) where cost/reward functions $\f_t$ admit a "pseudo-1d" structure, i.e. $\f_t(\w) = \loss_t(\pred_t(\w))$ where the output of $\pred_t$ is…

Machine Learning · Computer Science 2021-02-16 Aadirupa Saha , Nagarajan Natarajan , Praneeth Netrapalli , Prateek Jain

Online Strongly Convex Optimization with Unknown Delays

We investigate the problem of online convex optimization with unknown delays, in which the feedback of a decision arrives with an arbitrary delay. Previous studies have presented a delayed variant of online gradient descent (OGD), and…

Machine Learning · Computer Science 2021-03-23 Yuanyu Wan , Wei-Wei Tu , Lijun Zhang

A Unified Framework for Analyzing Meta-algorithms in Online Convex Optimization

In this paper, we analyze the problem of online convex optimization in different settings, including different feedback types (full-information/semi-bandit/bandit/etc) in either stochastic or non-stochastic setting and different notions of…

Machine Learning · Computer Science 2026-02-23 Mohammad Pedramfar , Vaneet Aggarwal

Non-Stationary Bandit Convex Optimization: An Optimal Algorithm with Two-Point Feedback

This paper studies bandit convex optimization in non-stationary environments with two-point feedback, using dynamic regret as the performance measure. We propose an algorithm based on bandit mirror descent that extends naturally to…

Optimization and Control · Mathematics 2026-05-26 Chang He , Bo Jiang , Shuzhong Zhang

Recursive Exponential Weighting for Online Non-convex Optimization

In this paper, we investigate the online non-convex optimization problem which generalizes the classic {online convex optimization problem by relaxing the convexity assumption on the cost function. For this type of problem, the classic…

Machine Learning · Computer Science 2017-09-14 Lin Yang , Cheng Tan , Wing Shing Wong

Improved Regret Bounds for Projection-free Bandit Convex Optimization

We revisit the challenge of designing online algorithms for the bandit convex optimization problem (BCO) which are also scalable to high dimensional problems. Hence, we consider algorithms that are \textit{projection-free}, i.e., based on…

Machine Learning · Computer Science 2019-10-09 Dan Garber , Ben Kretzu

Towards Optimal Regret in Adversarial Linear MDPs with Bandit Feedback

We study online reinforcement learning in linear Markov decision processes with adversarial losses and bandit feedback, without prior knowledge on transitions or access to simulators. We introduce two algorithms that achieve improved regret…

Machine Learning · Computer Science 2023-10-19 Haolin Liu , Chen-Yu Wei , Julian Zimmert

Risk-Averse Stochastic Convex Bandit

Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a…

Machine Learning · Computer Science 2018-10-02 Adrian Rivera Cardoso , Huan Xu

Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback

Motivated by applications to online learning in sparse estimation and Bayesian optimization, we consider the problem of online unconstrained nonsubmodular minimization with delayed costs in both full information and bandit feedback…

Machine Learning · Computer Science 2022-06-02 Tianyi Lin , Aldo Pacchiano , Yaodong Yu , Michael I. Jordan

Adaptive Algorithms for Online Convex Optimization with Long-term Constraints

We present an adaptive online gradient descent algorithm to solve online convex optimization problems with long-term constraints , which are constraints that need to be satisfied when accumulated over a finite number of rounds T , but can…

Machine Learning · Statistics 2015-12-24 Rodolphe Jenatton , Jim Huang , Cédric Archambeau

Online Boosting with Bandit Feedback

We consider the problem of online boosting for regression tasks, when only limited information is available to the learner. We give an efficient regret minimization method that has two implications: an online boosting algorithm with noisy…

Machine Learning · Computer Science 2020-07-24 Nataly Brukhim , Elad Hazan