Related papers: Dynamic Selection in Algorithmic Decision-making
The deployment of Multi-Armed Bandits (MAB) has become commonplace in many economic applications. However, regret guarantees for even state-of-the-art linear bandit algorithms (such as Optimism in the Face of Uncertainty Linear bandit…
Two-sided online matching platforms are employed in various markets. However, agents' preferences in the current market are usually implicit and unknown, thus needing to be learned from data. With the growing availability of dynamic side…
In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets. This note is concerned with continuous-time models and algorithms for several online…
We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context and the error variance (or an estimator of this variance). This setting is motivated by a wide range of applications…
We consider a bandit problem where at any time, the decision maker can add new arms to her consideration set. A new arm is queried at a cost from an "arm-reservoir" containing finitely many "arm-types," each characterized by a distinct mean…
We consider model selection for sequential decision making in stochastic environments with bandit feedback, where a meta-learner has at its disposal a pool of base learners, and decides on the fly which action to take based on the policies…
The online meta-learning framework has arisen as a powerful tool for the continual lifelong learning setting. The goal for an agent is to quickly learn new tasks by drawing on prior experience, while it faces with tasks one after another.…
This paper studies bandit problems where an agent has access to offline data that might be utilized to potentially improve the estimation of each arm's reward distribution. A major obstacle in this setting is the existence of compound…
We introduce the problem of model selection for contextual bandits, where a learner must adapt to the complexity of the optimal policy while balancing exploration and exploitation. Our main result is a new model selection guarantee for…
This paper establishes a connection between a category of discrete choice models and the realms of online learning and multiarmed bandit algorithms. Our contributions can be summarized in two key aspects. Firstly, we furnish sublinear…
Ranking algorithms are fundamental to various online platforms across e-commerce sites to content streaming services. Our research addresses the challenge of adaptively ranking items from a candidate pool for heterogeneous users, a key…
In this paper, we consider the contextual variant of the MNL-Bandit problem. More specifically, we consider a dynamic set optimization problem, where a decision-maker offers a subset (assortment) of products to a consumer and observes the…
Algorithm selection is typically based on models of algorithm performance, learned during a separate offline training sequence, which can be prohibitively expensive. In recent work, we adopted an online approach, in which a performance…
We present an algorithm guaranteeing dynamic regret bounds for online omniprediction with long term constraints. The goal in this recently introduced problem is for a learner to generate a sequence of predictions which are broadcast to a…
We consider model selection in stochastic bandit and reinforcement learning problems. Given a set of base learning algorithms, an effective model selection strategy adapts to the best learning algorithm in an online fashion. We show that by…
Regret minimization is treated as the golden rule in the traditional study of online learning. However, regret minimization algorithms tend to converge to the static optimum, thus being suboptimal for changing environments. To address this…
Motivated by the phenomenon of strategic agents gaming a recommender system to maximize the number of times they are recommended to users, we study a strategic variant of the linear contextual bandit problem, where the arms can…
We study the problem of dynamic batch learning in high-dimensional sparse linear contextual bandits, where a decision maker, under a given maximum-number-of-batch constraint and only able to observe rewards at the end of each batch, can…
We study the non-stationary stochastic multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning. The performance of a learning algorithm is evaluated in terms of their…
Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually…