Related papers: Bandit Data-Driven Optimization

Contextual Bandits with Budgeted Information Reveal

Contextual bandit algorithms are commonly used in digital health to recommend personalized treatments. However, to ensure the effectiveness of the treatments, patients are often requested to take actions that have no immediate benefit to…

Machine Learning · Computer Science 2024-03-14 Kyra Gan , Esmaeil Keyvanshokooh , Xueqing Liu , Susan Murphy

Budgeted and Non-budgeted Causal Bandits

Learning good interventions in a causal graph can be modelled as a stochastic multi-armed bandit problem with side-information. First, we study this problem when interventions are more expensive than observations and a budget is specified.…

Machine Learning · Computer Science 2020-12-15 Vineet Nair , Vishakha Patil , Gaurav Sinha

Identifiable Latent Bandits: Leveraging observational data for personalized decision-making

Sequential decision-making algorithms such as multi-armed bandits can find optimal personalized decisions, but are notoriously sample-hungry. In personalized medicine, for example, training a bandit from scratch for every patient is…

Machine Learning · Computer Science 2026-05-12 Ahmet Zahid Balcıoğlu , Newton Mwai , Emil Carlsson , Fredrik D. Johansson

An Algorithmic Framework to Control Bias in Bandit-based Personalization

Personalization is pervasive in the online space as it leads to higher efficiency and revenue by allowing the most relevant content to be served to each user. However, recent studies suggest that personalization methods can propagate…

Machine Learning · Computer Science 2018-02-26 L. Elisa Celis , Sayash Kapoor , Farnood Salehi , Nisheeth K. Vishnoi

Bandits with Partially Observable Confounded Data

We study linear contextual bandits with access to a large, confounded, offline dataset that was sampled from some fixed policy. We show that this problem is closely related to a variant of the bandit problem with side information. We…

Machine Learning · Computer Science 2021-08-11 Guy Tennenholtz , Uri Shalit , Shie Mannor , Yonathan Efroni

Causal Bandits: Online Decision-Making in Endogenous Settings

The deployment of Multi-Armed Bandits (MAB) has become commonplace in many economic applications. However, regret guarantees for even state-of-the-art linear bandit algorithms (such as Optimism in the Face of Uncertainty Linear bandit…

Econometrics · Economics 2023-02-28 Jingwen Zhang , Yifang Chen , Amandeep Singh

Recommending with Recommendations

Recommendation systems are a key modern application of machine learning, but they have the downside that they often draw upon sensitive user information in making their predictions. We show how to address this deficiency by basing a…

Machine Learning · Computer Science 2021-12-03 Naveen Durvasula , Franklyn Wang , Scott Duke Kominers

Efficient Contextual Bandits with Uninformed Feedback Graphs

Bandits with feedback graphs are powerful online learning models that interpolate between the full information and classic bandit problems, capturing many real-life applications. A recent work by Zhang et al. (2023) studies the contextual…

Machine Learning · Computer Science 2024-02-14 Mengxiao Zhang , Yuheng Zhang , Haipeng Luo , Paul Mineiro

Combining Offline Causal Inference and Online Bandit Learning for Data Driven Decision

A fundamental question for companies with large amount of logged data is: How to use such logged data together with incoming streaming data to make good decisions? Many companies currently make decisions via online A/B tests, but wrong…

Machine Learning · Computer Science 2020-11-10 Li Ye , Yishi Lin , Hong Xie , John C. S. Lui

Semi-bandit Optimization in the Dispersed Setting

The goal of data-driven algorithm design is to obtain high-performing algorithms for specific application domains using machine learning and data. Across many fields in AI, science, and engineering, practitioners will often fix a family of…

Machine Learning · Computer Science 2020-12-22 Maria-Florina Balcan , Travis Dick , Wesley Pegden

Extending Open Bandit Pipeline to Simulate Industry Challenges

Bandit algorithms are often used in the e-commerce industry to train Machine Learning (ML) systems when pre-labeled data is unavailable. However, the industry setting poses various challenges that make implementing bandit algorithms in…

Machine Learning · Computer Science 2022-09-12 Bram van den Akker , Niklas Weber , Felipe Moraes , Dmitri Goldenberg

Efficient learning by implicit exploration in bandit problems with side observations

We consider online learning problems under a partial observability model capturing situations where the information conveyed to the learner is between full information and bandit feedback. In the simplest variant, we assume that in addition…

Machine Learning · Computer Science 2026-04-28 Tomas Kocak , Gergely Neu , Michal Valko , Remi Munos

Mixed-Integer Optimization with Constraint Learning

We establish a broad methodological foundation for mixed-integer optimization with learned constraints. We propose an end-to-end pipeline for data-driven decision making in which constraints and objectives are directly learned from data…

Optimization and Control · Mathematics 2023-10-30 Donato Maragno , Holly Wiberg , Dimitris Bertsimas , S. Ilker Birbil , Dick den Hertog , Adejuyigbe Fajemisin

Regret Minimization with Performative Feedback

In performative prediction, the deployment of a predictive model triggers a shift in the data distribution. As these shifts are typically unknown ahead of time, the learner needs to deploy a model to get feedback about the distribution it…

Machine Learning · Computer Science 2022-07-19 Meena Jagadeesan , Tijana Zrnic , Celestine Mendler-Dünner

Prompt-Tuning Bandits: Enabling Few-Shot Generalization for Efficient Multi-Task Offline RL

Prompting has emerged as the dominant paradigm for adapting large, pre-trained transformer-based models to downstream tasks. The Prompting Decision Transformer (PDT) enables large-scale, multi-task offline Reinforcement Learning (RL)…

Machine Learning · Computer Science 2025-07-21 Finn Rietz , Oleg Smirnov , Sara Karimi , Lele Cao

Efficient Multi-objective Prompt Optimization via Pure-exploration Bandits

Prompt engineering has become central to eliciting the capabilities of large language models (LLMs). At its core lies prompt selection -- efficiently identifying the most effective prompts. However, most prior investigations overlook a key…

Machine Learning · Computer Science 2026-05-15 Donghao Li , Chengshuai Shi , Weijuan Ou , Cong Shen , Jing Yang

Practical Bandits: An Industry Perspective

The bandit paradigm provides a unified modeling framework for problems that require decision-making under uncertainty. Because many business metrics can be viewed as rewards (a.k.a. utilities) that result from actions, bandit algorithms…

Machine Learning · Computer Science 2023-02-03 Bram van den Akker , Olivier Jeunen , Ying Li , Ben London , Zahra Nazari , Devesh Parekh

Decentralized Online Big Data Classification - a Bandit Framework

Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data…

Machine Learning · Computer Science 2013-08-27 Cem Tekin , Mihaela van der Schaar

Leveraging Offline Data in Linear Latent Contextual Bandits

Leveraging offline data is an attractive way to accelerate online sequential decision-making. However, it is crucial to account for latent states in users or environments in the offline data, and latent bandits form a compelling model for…

Machine Learning · Computer Science 2025-09-03 Chinmaya Kausik , Kevin Tan , Ambuj Tewari

Optimistic Information Directed Sampling

We study the problem of online learning in contextual bandit problems where the loss function is assumed to belong to a known parametric function class. We propose a new analytic framework for this setting that bridges the Bayesian theory…

Machine Learning · Computer Science 2024-06-28 Gergely Neu , Matteo Papini , Ludovic Schwartz