English
Related papers

Related papers: Synopsis: Sequential Decision Problems with Weak F…

200 papers

This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem,…

Machine Learning · Computer Science 2022-12-23 Arun Verma

We study the sequential batch learning problem in linear contextual bandits with finite action sets, where the decision maker is constrained to split incoming individuals into (at most) a fixed number of batches and can only observe…

Machine Learning · Computer Science 2020-04-15 Yanjun Han , Zhengqing Zhou , Zhengyuan Zhou , Jose Blanchet , Peter W. Glynn , Yinyu Ye

We consider the problem of sequentially making decisions that are rewarded by "successes" and "failures" which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance.…

Machine Learning · Statistics 2017-09-18 Yingfei Wang , Chu Wang , Warren Powell

In many security and healthcare systems a sequence of features/sensors/tests are used for detection and diagnosis. Each test outputs a prediction of the latent state, and carries with it inherent costs. Our objective is to {\it learn}…

Machine Learning · Computer Science 2016-10-19 Manjesh Hanawal , Csaba Szepesvari , Venkatesh Saligrama

We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this…

Machine Learning · Computer Science 2016-11-07 Akshay Krishnamurthy , Alekh Agarwal , Miroslav Dudik

Partial monitoring is an expressive framework for sequential decision-making with an abundance of applications, including graph-structured and dueling bandits, dynamic pricing and transductive feedback models. We survey and extend recent…

Machine Learning · Computer Science 2023-11-15 Johannes Kirschner , Tor Lattimore , Andreas Krause

In this paper, we study Contextual Unsupervised Sequential Selection (USS), a new variant of the stochastic contextual bandits problem where the loss of an arm cannot be inferred from the observed feedback. In our setup, arms are associated…

Machine Learning · Computer Science 2020-10-26 Arun Verma , Manjesh K. Hanawal , Csaba Szepesvári , Venkatesh Saligrama

We consider the problem of sequential evaluation, in which an evaluator observes candidates in a sequence and assigns scores to these candidates in an online, irrevocable fashion. Motivated by the psychology literature that has studied…

Machine Learning · Statistics 2023-11-20 Jingyan Wang , Ashwin Pananjady

We consider the combinatorial bandits problem with semi-bandit feedback under finite sampling budget constraints, in which the learner can carry out its action only for a limited number of times specified by an overall budget. The action is…

Machine Learning · Computer Science 2022-10-17 Jasmin Brandt , Viktor Bengs , Björn Haddenhorst , Eyke Hüllermeier

We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead…

Machine Learning · Computer Science 2022-07-20 Germano Gabbianelli , Matteo Papini , Gergely Neu

In many stochastic service systems, decision-makers find themselves making a sequence of decisions, with the number of decisions being unpredictable. To enhance these decisions, it is crucial to uncover the causal impact these decisions…

Methodology · Statistics 2023-07-18 Juan C. David Gomez , Amy L. Cochran , Gabriel Zayas-Caban

Research on the multi-armed bandit problem has studied the trade-off of exploration and exploitation in depth. However, there are numerous applications where the cardinal absolute-valued feedback model (e.g. ratings from one to five) is not…

Machine Learning · Computer Science 2018-12-12 Lennard Hilgendorf

In this paper, we study censored Semi-Bandits, a novel variant of the semi-bandits problem. The learner is assumed to have a fixed amount of resources, which it allocates to the arms at each time step. The loss observed from an arm is…

Machine Learning · Computer Science 2020-03-26 Arun Verma , Manjesh K. Hanawal , Arun Rajkumar , Raman Sankaran

The partial monitoring (PM) framework provides a theoretical formulation of sequential learning problems with incomplete feedback. On each round, a learning agent plays an action while the environment simultaneously chooses an outcome. The…

Machine Learning · Computer Science 2024-05-17 Maxime Heuillet , Ola Ahmad , Audrey Durand

Motivated by the observation that overexposure to unwanted marketing activities leads to customer dissatisfaction, we consider a setting where a platform offers a sequence of messages to its users and is penalized when users abandon the…

Machine Learning · Computer Science 2019-03-21 Junyu Cao , Wei Sun

Sequential decision-making under uncertainty is often associated with long feedback delays. Such delays degrade the performance of the learning agent in identifying a subset of arms with the optimal collective reward in the long run. This…

Machine Learning · Computer Science 2023-07-19 Saeed Ghoorchian , Setareh Maghsudi

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can…

Machine Learning · Computer Science 2026-04-29 Gergely Neu , Michal Valko

Partial monitoring is a generic framework for sequential decision-making with incomplete feedback. It encompasses a wide class of problems such as dueling bandits, learning with expect advice, dynamic pricing, dark pools, and label…

Machine Learning · Computer Science 2024-06-27 Pratik Gajane , Tanguy Urvoy

Consider a setting in which a policy maker assigns subjects to treatments, observing each outcome before the next subject arrives. Initially, it is unknown which treatment is best, but the sequential nature of the problem permits learning…

Econometrics · Economics 2020-08-13 Anders Bredahl Kock , David Preinerstorfer , Bezirgen Veliyev

We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multi-armed bandit problems. In an experiment with $n$ time steps, we let the mean reward gaps between…

Statistics Theory · Mathematics 2023-06-26 Xu Kuang , Stefan Wager
‹ Prev 1 2 3 10 Next ›