Related papers: Contextual Linear Optimization with Partial Feedba…

Fast Rates for Contextual Linear Optimization

Incorporating side observations in decision making can reduce uncertainty and boost performance, but it also requires we tackle a potentially complex predictive relationship. While one may use off-the-shelf machine learning methods to…

Machine Learning · Statistics 2021-09-01 Yichun Hu , Nathan Kallus , Xiaojie Mao

Bandit Convex Optimization in Non-stationary Environments

Bandit Convex Optimization (BCO) is a fundamental framework for modeling sequential decision-making with partial information, where the only feedback available to the player is the one-point or two-point function values. In this paper, we…

Machine Learning · Computer Science 2020-07-07 Peng Zhao , Guanghui Wang , Lijun Zhang , Zhi-Hua Zhou

BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits

We present efficient algorithms for the problem of contextual bandits with i.i.d. covariates, an arbitrary sequence of rewards, and an arbitrary class of policies. Our algorithm BISTRO requires d calls to the empirical risk minimization…

Machine Learning · Computer Science 2016-02-09 Alexander Rakhlin , Karthik Sridharan

Contextual Inverse Optimization: Offline and Online Learning

We study the problems of offline and online contextual optimization with feedback information, where instead of observing the loss, we observe, after-the-fact, the optimal action an oracle with full knowledge of the objective function would…

Machine Learning · Computer Science 2023-07-04 Omar Besbes , Yuri Fonseca , Ilan Lobel

Latency-Aware Contextual Bandit: Application to Cryo-EM Data Collection

We introduce a latency-aware contextual bandit framework that generalizes the standard contextual bandit problem, where the learner adaptively selects arms and switches decision sets under action delays. In this setting, the learner…

Machine Learning · Statistics 2025-10-10 Lai Wei , Ambuj Tewari , Michael A. Cianfrocco

Early Stopping in Contextual Bandits and Inferences

Bandit algorithms sequentially accumulate data using adaptive sampling policies, offering flexibility for real-world applications. However, excessive sampling can be costly, motivating the devolopment of early stopping methods and reliable…

Statistics Theory · Mathematics 2025-02-06 Zihan Cui

Efficient Optimal Learning for Contextual Bandits

We address the problem of learning in an online setting where the learner repeatedly observes features, selects among a set of actions, and receives reward for the action taken. We provide the first efficient algorithm with an optimal…

Machine Learning · Computer Science 2011-06-17 Miroslav Dudik , Daniel Hsu , Satyen Kale , Nikos Karampatziakis , John Langford , Lev Reyzin , Tong Zhang

Adaptive Exploration in Linear Contextual Bandit

Contextual bandits serve as a fundamental model for many sequential decision making tasks. The most popular theoretically justified approaches are based on the optimism principle. While these algorithms can be practical, they are known to…

Machine Learning · Computer Science 2020-03-17 Botao Hao , Tor Lattimore , Csaba Szepesvari

Cramming Contextual Bandits for On-policy Statistical Evaluation

We introduce the cram method as a general statistical framework for evaluating the final learned policy from a multi-armed contextual bandit algorithm, using the dataset generated by the same bandit algorithm. The proposed on-policy…

Machine Learning · Computer Science 2025-04-16 Zeyang Jia , Kosuke Imai , Michael Lingzhi Li

Counterfactual Risk Minimization: Learning from Logged Bandit Feedback

We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, recommendation), where an algorithm makes a…

Machine Learning · Computer Science 2015-05-22 Adith Swaminathan , Thorsten Joachims

Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models

We consider the regret minimization task in a dueling bandits problem with context information. In every round of the sequential decision problem, the learner makes a context-dependent selection of two choice alternatives (arms) to be…

Machine Learning · Computer Science 2022-10-14 Viktor Bengs , Aadirupa Saha , Eyke Hüllermeier

Contextual Semibandits via Supervised Learning Oracles

We study an online decision making problem where on each round a learner chooses a list of items based on some side information, receives a scalar feedback value for each individual item, and a reward that is linearly related to this…

Machine Learning · Computer Science 2016-11-07 Akshay Krishnamurthy , Alekh Agarwal , Miroslav Dudik

Learning from Bandit Feedback: An Overview of the State-of-the-art

In machine learning we often try to optimise a decision rule that would have worked well over a historical dataset; this is the so called empirical risk minimisation principle. In the context of learning from recommender system logs,…

Information Retrieval · Computer Science 2019-09-19 Olivier Jeunen , Dmytro Mykhaylov , David Rohde , Flavian Vasile , Alexandre Gilotte , Martin Bompaire

Online Multi-LLM Selection via Contextual Bandits under Unstructured Context Evolution

Large language models (LLMs) exhibit diverse response behaviors, costs, and strengths, making it challenging to select the most suitable LLM for a given user query. We study the problem of adaptive multi-LLM selection in an online setting,…

Machine Learning · Computer Science 2025-06-24 Manhin Poon , XiangXiang Dai , Xutong Liu , Fang Kong , John C. S. Lui , Jinhang Zuo

OSOM: A simultaneously optimal algorithm for multi-armed and linear contextual bandits

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden simple multi-armed bandit structure in which the rewards are independent of the contextual information. Algorithms that are designed…

Machine Learning · Statistics 2020-10-07 Niladri S. Chatterji , Vidya Muthukumar , Peter L. Bartlett

Adaptive LLM Routing under Budget Constraints

Large Language Models (LLMs) have revolutionized natural language processing, but their varying capabilities and costs pose challenges in practical applications. LLM routing addresses this by dynamically selecting the most suitable LLM for…

Machine Learning · Computer Science 2025-09-10 Pranoy Panda , Raghav Magazine , Chaitanya Devaguptapu , Sho Takemori , Vishal Sharma

Constrained Online Convex Optimization with Memory and Predictions

We study Constrained Online Convex Optimization with Memory (COCO-M), where both the loss and the constraints depend on a finite window of past decisions made by the learner. This setting extends the previously studied unconstrained online…

Machine Learning · Computer Science 2026-03-24 Mohammed Abdullah , George Iosifidis , Salah Eddine Elayoubi , Tijani Chahed

Online Stochastic Linear Optimization under One-bit Feedback

In this paper, we study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round. This problem has found many applications including online advertisement…

Machine Learning · Computer Science 2015-09-28 Lijun Zhang , Tianbao Yang , Rong Jin , Zhi-Hua Zhou

Interactively Learning Preference Constraints in Linear Bandits

We study sequential decision-making with known rewards and unknown constraints, motivated by situations where the constraints represent expensive-to-evaluate human preferences, such as safe and comfortable driving behavior. We formalize the…

Machine Learning · Computer Science 2022-06-13 David Lindner , Sebastian Tschiatschek , Katja Hofmann , Andreas Krause

Learning to Route LLMs from Bandit Feedback: One Policy, Many Trade-offs

Efficient use of large language models (LLMs) is critical for deployment at scale: without adaptive routing, systems either overpay for strong models or risk poor performance from weaker ones. Selecting the right LLM for each query is…

Machine Learning · Computer Science 2025-10-10 Wang Wei , Tiankai Yang , Hongjie Chen , Yue Zhao , Franck Dernoncourt , Ryan A. Rossi , Hoda Eldardiry