Related papers: Synopsis: Sequential Decision Problems with Weak F…
This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem,…
We study the sequential batch learning problem in linear contextual bandits with finite action sets, where the decision maker is constrained to split incoming individuals into (at most) a fixed number of batches and can only observe…
We consider the problem of sequentially making decisions that are rewarded by "successes" and "failures" which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance.…
In many security and healthcare systems a sequence of features/sensors/tests are used for detection and diagnosis. Each test outputs a prediction of the latent state, and carries with it inherent costs. Our objective is to {\it learn}…
We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this…
Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models. We survey and extend recent…
In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated…
We consider the problem of sequential evaluation, in which an evaluator observes candidates in a sequence and assigns scores to these candidates in an online, irrevocable fashion. Motivated by the psychology literature that has studied…
We consider the combinatorial bandits problem with semi-bandit feedback under finite sampling budget constraints, in which the learner can carry out its action only for a limited number of times specified by an overall budget. The action is…
We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead…
In many stochastic service systems, decision-makers find themselves making a sequence of decisions, with the number of decisions being unpredictable. To enhance these decisions, it is crucial to uncover the causal impact these decisions…
Research on the multi-armed bandit problem has studied the trade-off of exploration and exploitation in depth. However, there are numerous applications where the cardinal absolute-valued feedback model (e.g. ratings from one to five) is not…
In this paper, we study censored Semi-Bandits, a novel variant of the semi-bandits problem. The learner is assumed to have a fixed amount of resources, which it allocates to the arms at each time step. The loss observed from an arm is…
The partial monitoring (PM) framework provides a theoretical formulation of sequential learning problems with incomplete feedback. On each round, a learning agent plays an action while the environment simultaneously chooses an outcome. The…
Motivated by the observation that overexposure to unwanted marketing activities leads to customer dissatisfaction, we consider a setting where a platform offers a sequence of messages to its users and is penalized when users abandon the…
Sequential decision-making under uncertainty is often associated with long feedback delays. Such delays degrade the performance of the learning agent in identifying a subset of arms with the optimal collective reward in the long run. This…
Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can…
Partial monitoring is a generic framework for sequential decision-making with incomplete feedback. It encompasses a wide class of problems such as dueling bandits, learning with expect advice, dynamic pricing, dark pools, and label…
Consider a setting in which a policy maker assigns subjects to treatments, observing each outcome before the next subject arrives. Initially, it is unknown which treatment is best, but the sequential nature of the problem permits learning…
We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between…