English
Related papers

Related papers: Optimal anytime regret with two experts

200 papers

We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal…

Machine Learning · Computer Science 2011-06-17 Miroslav Dudik , Daniel Hsu , Satyen Kale , Nikos Karampatziakis , John Langford , Lev Reyzin , Tong Zhang

This work studies linear bandits under a new notion of gap-adjusted misspecification and is an extension of Liu et al. (2023). When the underlying reward function is not linear, existing linear bandits work usually relies on a uniform…

Machine Learning · Computer Science 2025-01-10 Chong Liu , Dan Qiao , Ming Yin , Ilija Bogunovic , Yu-Xiang Wang

This paper considers a stochastic Multi-Armed Bandit (MAB) problem with dual objectives: (i) quick identification and commitment to the optimal arm, and (ii) reward maximization throughout a sequence of $T$ consecutive rounds. Though each…

Machine Learning · Computer Science 2024-05-31 Qining Zhang , Lei Ying

The parallel machine scheduling problem has been a popular topic for many years due to its theoretical and practical importance. This paper addresses the robust makespan optimization problem on unrelated parallel machine scheduling with…

Optimization and Control · Mathematics 2020-10-23 Chutong Gao , Weihao Wang , Leyuan Shi

We study the problem of uncertainty quantification via prediction sets, in an online setting where the data distribution may vary arbitrarily over time. Recent work develops online conformal prediction techniques that leverage regret…

Machine Learning · Computer Science 2023-02-16 Aadyot Bhatnagar , Huan Wang , Caiming Xiong , Yu Bai

We study the problem of online prediction, in which at each time step $t$, an individual $x_t$ arrives, whose label we must predict. Each individual is associated with various groups, defined based on their features such as age, sex, race…

Machine Learning · Computer Science 2023-10-10 Krishna Acharya , Eshwar Ram Arunachaleswaran , Sampath Kannan , Aaron Roth , Juba Ziani

This paper studies bandit convex optimization in non-stationary environments with two-point feedback, using dynamic regret as the performance measure. We propose an algorithm based on bandit mirror descent that extends naturally to…

Optimization and Control · Mathematics 2026-05-26 Chang He , Bo Jiang , Shuzhong Zhang

We consider a robust aggregation problem in the presence of both truthful and adversarial experts. The truthful experts will report their private signals truthfully, while the adversarial experts can report arbitrarily. We assume experts…

Machine Learning · Computer Science 2025-02-07 Yongkang Guo , Yuqing Kong

We study stochastic linear optimization problem with bandit feedback. The set of arms take values in an $N$-dimensional space and belong to a bounded polyhedron described by finitely many linear inequalities. We provide a lower bound for…

Machine Learning · Computer Science 2015-09-29 Manjesh K. Hanawal , Amir Leshem , Venkatesh Saligrama

In this paper, we present a learning algorithm that achieves asymptotically optimal regret for Markov decision processes in average reward under a communicating assumption. That is, given a communicating Markov decision process $M$, our…

Machine Learning · Computer Science 2025-05-26 Victor Boone

The problem of bandit with graph feedback generalizes both the multi-armed bandit (MAB) problem and the learning with expert advice problem by encoding in a directed graph how the loss vector can be observed in each round of the game. The…

Machine Learning · Computer Science 2023-08-07 Yuchen He , Chihao Zhang

Stochastic linear bandits are a fundamental model for sequential decision making, where an agent selects a vector-valued action and receives a noisy reward with expected value given by an unknown linear function. Although well studied in…

Machine Learning · Computer Science 2025-06-23 Bruce Huang , Ruida Zhou , Lin F. Yang , Suhas Diggavi

We study the problem of worst case regret in piecewise stationary multi armed bandits. While the minimax theory for stationary bandits is well established, understanding analogous limits in time-varying settings is challenging. Existing…

Machine Learning · Computer Science 2025-11-11 Gal Mendelson , Eyal Tadmor

This study considers online learning with general directed feedback graphs. For this problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments as well as poly-logarithmic regret…

Machine Learning · Computer Science 2022-12-29 Shinji Ito , Taira Tsuchiya , Junya Honda

Motivated by learning of correlated equilibria in non-cooperative games, we perform a large deviations analysis of a regret minimizing stochastic approximation algorithm. The regret minimization algorithm we consider comprises multiple…

Optimization and Control · Mathematics 2024-06-04 Hongjiang Qian , Vikram Krishnamurthy

Robust optimization is a widely studied area in operations research, where the algorithm takes as input a range of values and outputs a single solution that performs well for the entire range. Specifically, a robust algorithm aims to…

Data Structures and Algorithms · Computer Science 2020-05-19 Arun Ganesh , Bruce M. Maggs , Debmalya Panigrahi

This paper investigates the problem of regret minimization in linear time-varying (LTV) dynamical systems. Due to the simultaneous presence of uncertainty and non-stationarity, designing online control algorithms for unknown LTV systems…

Machine Learning · Computer Science 2022-06-07 Yuzhen Han , Ruben Solozabal , Jing Dong , Xingyu Zhou , Martin Takac , Bin Gu

We propose a simple model selection approach for algorithms in stochastic bandit and reinforcement learning problems. As opposed to prior work that (implicitly) assumes knowledge of the optimal regret, we only require that each base…

Machine Learning · Computer Science 2020-12-25 Aldo Pacchiano , Christoph Dann , Claudio Gentile , Peter Bartlett

Performance of adaptive control policies is assessed through the regret with respect to the optimal regulator, which reflects the increase in the operating cost due to uncertainty about the dynamics parameters. However, available results in…

Systems and Control · Computer Science 2020-03-24 Mohamad Kazem Shirani Faradonbeh , Ambuj Tewari , George Michailidis

In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times), given a set of $n$ experts who make predictions on each day (or time). The algorithm is given feedback on the…

Data Structures and Algorithms · Computer Science 2023-03-06 David P. Woodruff , Fred Zhang , Samson Zhou
‹ Prev 1 8 9 10 Next ›