Related papers: Synopsis: Sequential Decision Problems with Weak F…

Sequential Decision Problems with Weak Feedback

This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem,…

Machine Learning · Computer Science 2022-12-23 Arun Verma

Sequential Batch Learning in Finite-Action Linear Contextual Bandits

We study the sequential batch learning problem in linear contextual bandits with finite action sets, where the decision maker is constrained to split incoming individuals into (at most) a fixed number of batches and can only observe…

Machine Learning · Computer Science 2020-04-15 Yanjun Han , Zhengqing Zhou , Zhengyuan Zhou , Jose Blanchet , Peter W. Glynn , Yinyu Ye

Optimal Learning for Sequential Decision Making for Expensive Cost Functions with Stochastic Binary Feedbacks

We consider the problem of sequentially making decisions that are rewarded by "successes" and "failures" which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance.…

Machine Learning · Statistics 2017-09-18 Yingfei Wang , Chu Wang , Warren Powell

Sequential Learning without Feedback

In many security and healthcare systems a sequence of features/sensors/tests are used for detection and diagnosis. Each test outputs a prediction of the latent state, and carries with it inherent costs. Our objective is to {\it learn}…

Machine Learning · Computer Science 2016-10-19 Manjesh Hanawal , Csaba Szepesvari , Venkatesh Saligrama

Contextual Semibandits via Supervised Learning Oracles

We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this…

Machine Learning · Computer Science 2016-11-07 Akshay Krishnamurthy , Alekh Agarwal , Miroslav Dudik

Linear Partial Monitoring for Sequential Decision-Making: Algorithms, Regret Bounds and Applications

Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models. We survey and extend recent…

Machine Learning · Computer Science 2023-11-15 Johannes Kirschner , Tor Lattimore , Andreas Krause

Online Algorithm for Unsupervised Sequential Selection with Contextual Information

In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated…

Machine Learning · Computer Science 2020-10-26 Arun Verma , Manjesh K. Hanawal , Csaba Szepesvári , Venkatesh Saligrama

Modeling and Correcting Bias in Sequential Evaluation

We consider the problem of sequential evaluation, in which an evaluator observes candidates in a sequence and assigns scores to these candidates in an online, irrevocable fashion. Motivated by the psychology literature that has studied…

Machine Learning · Statistics 2023-11-20 Jingyan Wang , Ashwin Pananjady

Finding Optimal Arms in Non-stochastic Combinatorial Bandits with Semi-bandit Feedback and Finite Budget

We consider the combinatorial bandits problem with semi-bandit feedback under finite sampling budget constraints, in which the learner can carry out its action only for a limited number of times specified by an overall budget. The action is…

Machine Learning · Computer Science 2022-10-17 Jasmin Brandt , Viktor Bengs , Björn Haddenhorst , Eyke Hüllermeier

Online Learning with Off-Policy Feedback

We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead…

Machine Learning · Computer Science 2022-07-20 Germano Gabbianelli , Matteo Papini , Gergely Neu

Unveiling Bias in Sequential Decision Making: A Causal Inference Approach for Stochastic Service Systems

In many stochastic service systems, decision-makers find themselves making a sequence of decisions, with the number of decisions being unpredictable. To enhance these decisions, it is crucial to uncover the causal impact these decisions…

Methodology · Statistics 2023-07-18 Juan C. David Gomez , Amy L. Cochran , Gabriel Zayas-Caban

Duelling Bandits with Weak Regret in Adversarial Environments

Research on the multi-armed bandit problem has studied the trade-off of exploration and exploitation in depth. However, there are numerous applications where the cardinal absolute-valued feedback model (e.g. ratings from one to five) is not…

Machine Learning · Computer Science 2018-12-12 Lennard Hilgendorf

Censored Semi-Bandits: A Framework for Resource Allocation with Censored Feedback

In this paper, we study censored Semi-Bandits, a novel variant of the semi-bandits problem. The learner is assumed to have a fixed amount of resources, which it allocates to the arms at each time step. The loss observed from an arm is…

Machine Learning · Computer Science 2020-03-26 Arun Verma , Manjesh K. Hanawal , Arun Rajkumar , Raman Sankaran

Randomized Confidence Bounds for Stochastic Partial Monitoring

The partial monitoring (PM) framework provides a theoretical formulation of sequential learning problems with incomplete feedback. On each round, a learning agent plays an action while the environment simultaneously chooses an outcome. The…

Machine Learning · Computer Science 2024-05-17 Maxime Heuillet , Ola Ahmad , Audrey Durand

Dynamic Learning of Sequential Choice Bandit Problem under Marketing Fatigue

Motivated by the observation that overexposure to unwanted marketing activities leads to customer dissatisfaction, we consider a setting where a platform offers a sequence of messages to its users and is penalized when users abandon the…

Machine Learning · Computer Science 2019-03-21 Junyu Cao , Wei Sun

Non-stationary Delayed Combinatorial Semi-Bandit with Causally Related Rewards

Sequential decision-making under uncertainty is often associated with long feedback delays. Such delays degrade the performance of the learning agent in identifying a subset of arms with the optimal collective reward in the long run. This…

Machine Learning · Computer Science 2023-07-19 Saeed Ghoorchian , Setareh Maghsudi

Online combinatorial optimization with stochastic decision sets and adversarial losses

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can…

Machine Learning · Computer Science 2026-04-29 Gergely Neu , Michal Valko

Utility-based Dueling Bandits as a Partial Monitoring Game

Partial monitoring is a generic framework for sequential decision-making with incomplete feedback. It encompasses a wide class of problems such as dueling bandits, learning with expect advice, dynamic pricing, dark pools, and label…

Machine Learning · Computer Science 2024-06-27 Pratik Gajane , Tanguy Urvoy

Functional Sequential Treatment Allocation

Consider a setting in which a policy maker assigns subjects to treatments, observing each outcome before the next subject arrives. Initially, it is unknown which treatment is best, but the sequential nature of the problem permits learning…

Econometrics · Economics 2020-08-13 Anders Bredahl Kock , David Preinerstorfer , Bezirgen Veliyev

Weak Signal Asymptotics for Sequentially Randomized Experiments

We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between…

Statistics Theory · Mathematics 2023-06-26 Xu Kuang , Stefan Wager