Related papers: Dynamic Selection in Algorithmic Decision-making

Causal Bandits: Online Decision-Making in Endogenous Settings

The deployment of Multi-Armed Bandits (MAB) has become commonplace in many economic applications. However, regret guarantees for even state-of-the-art linear bandit algorithms (such as Optimism in the Face of Uncertainty Linear bandit…

Econometrics · Economics 2023-02-28 Jingwen Zhang , Yifang Chen , Amandeep Singh

Dynamic Matching Bandit For Two-Sided Online Markets

Two-sided online matching platforms are employed in various markets. However, agents' preferences in the current market are usually implicit and unknown, thus needing to be learned from data. With the growing availability of dynamic side…

Machine Learning · Computer Science 2024-05-30 Yuantong Li , Chi-hua Wang , Guang Cheng , Will Wei Sun

A note on continuous-time online learning

In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets. This note is concerned with continuous-time models and algorithms for several online…

Machine Learning · Statistics 2024-05-20 Lexing Ying

Online learning in bandits with predicted context

We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context and the error variance (or an estimator of this variance). This setting is motivated by a wide range of applications…

Machine Learning · Statistics 2024-03-19 Yongyi Guo , Ziping Xu , Susan Murphy

Bandits with Dynamic Arm-acquisition Costs

We consider a bandit problem where at any time, the decision maker can add new arms to her consideration set. A new arm is queried at a cost from an "arm-reservoir" containing finitely many "arm-types," each characterized by a distinct mean…

Machine Learning · Computer Science 2022-10-10 Anand Kalvit , Assaf Zeevi

Data-Driven Online Model Selection With Regret Guarantees

We consider model selection for sequential decision making in stochastic environments with bandit feedback, where a meta-learner has at its disposal a pool of base learners, and decides on the fly which action to take based on the policies…

Machine Learning · Computer Science 2024-01-24 Aldo Pacchiano , Christoph Dann , Claudio Gentile

Dynamic Regret Analysis for Online Meta-Learning

The online meta-learning framework has arisen as a powerful tool for the continual lifelong learning setting. The goal for an agent is to quickly learn new tasks by drawing on prior experience, while it faces with tasks one after another.…

Machine Learning · Computer Science 2021-09-30 Parvin Nazari , Esmaile Khorram

Robustly Improving Bandit Algorithms with Confounded and Selection Biased Offline Data: A Causal Approach

This paper studies bandit problems where an agent has access to offline data that might be utilized to potentially improve the estimation of each arm's reward distribution. A major obstacle in this setting is the existence of compound…

Machine Learning · Computer Science 2023-12-21 Wen Huang , Xintao Wu

Model selection for contextual bandits

We introduce the problem of model selection for contextual bandits, where a learner must adapt to the complexity of the optimal policy while balancing exploration and exploitation. Our main result is a new model selection guarantee for…

Machine Learning · Computer Science 2019-11-15 Dylan J. Foster , Akshay Krishnamurthy , Haipeng Luo

Discrete Choice Multi-Armed Bandits

This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms. Our contributions can be summarized in two key aspects. Firstly, we furnish sublinear…

Machine Learning · Statistics 2023-10-03 Emerson Melo , David Müller

Adaptively Learning to Select-Rank in Online Platforms

Ranking algorithms are fundamental to various online platforms across e-commerce sites to content streaming services. Our research addresses the challenge of adaptively ranking items from a candidate pool for heterogeneous users, a key…

Machine Learning · Computer Science 2024-06-10 Jingyuan Wang , Perry Dong , Ying Jin , Ruohan Zhan , Zhengyuan Zhou

A Tractable Online Learning Algorithm for the Multinomial Logit Contextual Bandit

In this paper, we consider the contextual variant of the MNL-Bandit problem. More specifically, we consider a dynamic set optimization problem, where a decision-maker offers a subset (assortment) of products to a consumer and observes the…

Machine Learning · Computer Science 2024-04-16 Priyank Agrawal , Theja Tulabandhula , Vashist Avadhanula

Algorithm Selection as a Bandit Problem with Unbounded Losses

Algorithm selection is typically based on models of algorithm performance, learned during a separate offline training sequence, which can be prohibitively expensive. In recent work, we adopted an online approach, in which a performance…

Artificial Intelligence · Computer Science 2013-01-31 Matteo Gagliolo , Juergen Schmidhuber

Dynamic Regret Bounds for Online Omniprediction with Long Term Constraints

We present an algorithm guaranteeing dynamic regret bounds for online omniprediction with long term constraints. The goal in this recently introduced problem is for a learner to generate a sequence of predictions which are broadcast to a…

Machine Learning · Computer Science 2025-10-09 Yahav Bechavod , Jiuyao Lu , Aaron Roth

Regret Balancing for Bandit and RL Model Selection

We consider model selection in stochastic bandit and reinforcement learning problems. Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion. We show that by…

Machine Learning · Computer Science 2020-06-11 Yasin Abbasi-Yadkori , Aldo Pacchiano , My Phan

Minimizing Dynamic Regret and Adaptive Regret Simultaneously

Regret minimization is treated as the golden rule in the traditional study of online learning. However, regret minimization algorithms tend to converge to the static optimum, thus being suboptimal for changing environments. To address this…

Machine Learning · Computer Science 2020-02-07 Lijun Zhang , Shiyin Lu , Tianbao Yang

Strategic Linear Contextual Bandits

Motivated by the phenomenon of strategic agents gaming a recommender system to maximize the number of times they are recommended to users, we study a strategic variant of the linear contextual bandit problem, where the arms can…

Machine Learning · Computer Science 2024-09-27 Thomas Kleine Buening , Aadirupa Saha , Christos Dimitrakakis , Haifeng Xu

Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits

We study the problem of dynamic batch learning in high-dimensional sparse linear contextual bandits, where a decision maker, under a given maximum-number-of-batch constraint and only able to observe rewards at the end of each batch, can…

Machine Learning · Statistics 2022-07-19 Zhimei Ren , Zhengyuan Zhou

A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

We study the non-stationary stochastic multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning. The performance of a learning algorithm is evaluated in terms of their…

Machine Learning · Computer Science 2022-03-09 Yasin Abbasi-Yadkori , Andras Gyorgy , Nevena Lazic

Learning Contextual Bandits in a Non-stationary Environment

Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually…

Machine Learning · Computer Science 2018-05-25 Qingyun Wu , Naveen Iyer , Hongning Wang