Related papers: Towards More Efficient, Robust, Instance-adaptive,…

Risk-Sensitive and Robust Model-Based Reinforcement Learning and Planning

Many sequential decision-making problems that are currently automated, such as those in manufacturing or recommender systems, operate in an environment where there is either little uncertainty, or zero risk of catastrophe. As companies and…

Machine Learning · Computer Science 2023-04-04 Marc Rigter

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. Are similar guarantees possible for contextual bandits? While…

Machine Learning · Computer Science 2020-10-08 Dylan J. Foster , Alexander Rakhlin , David Simchi-Levi , Yunzong Xu

Robust Reinforcement Learning Objectives for Sequential Recommender Systems

Attention-based sequential recommendation methods have shown promise in accurately capturing users' evolving interests from their past interactions. Recent research has also explored the integration of reinforcement learning (RL) into these…

Machine Learning · Computer Science 2024-04-19 Melissa Mozifian , Tristan Sylvain , Dave Evans , Lili Meng

Reinforcement Learning for Machine Learning Model Deployment: Evaluating Multi-Armed Bandits in ML Ops Environments

In modern ML Ops environments, model deployment is a critical process that traditionally relies on static heuristics such as validation error comparisons and A/B testing. However, these methods require human intervention to adapt to…

Machine Learning · Computer Science 2025-03-31 S. Aaron McClendon , Vishaal Venkatesh , Juan Morinelli

Selective Reviews of Bandit Problems in AI via a Statistical View

Reinforcement Learning (RL) is a widely researched area in artificial intelligence that focuses on teaching agents decision-making through interactions with their environment. A key subset includes stochastic multi-armed bandit (MAB) and…

Machine Learning · Statistics 2025-02-20 Pengjie Zhou , Haoyu Wei , Huiming Zhang

Identifiable Latent Bandits: Leveraging observational data for personalized decision-making

Sequential decision-making algorithms such as multi-armed bandits can find optimal personalized decisions, but are notoriously sample-hungry. In personalized medicine, for example, training a bandit from scratch for every patient is…

Machine Learning · Computer Science 2026-05-12 Ahmet Zahid Balcıoğlu , Newton Mwai , Emil Carlsson , Fredrik D. Johansson

Statistical and Algorithmic Foundations of Reinforcement Learning

As a paradigm for sequential decision making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of…

Machine Learning · Statistics 2025-07-22 Yuejie Chi , Yuxin Chen , Yuting Wei

Contextual Bandits with Large Action Spaces: Made Practical

A central problem in sequential decision making is to develop algorithms that are practical and computationally efficient, yet support the use of flexible, general-purpose models. Focusing on the contextual bandit problem, recent progress…

Machine Learning · Computer Science 2022-07-14 Yinglun Zhu , Dylan J. Foster , John Langford , Paul Mineiro

Prompt-Tuning Bandits: Enabling Few-Shot Generalization for Efficient Multi-Task Offline RL

Prompting has emerged as the dominant paradigm for adapting large, pre-trained transformer-based models to downstream tasks. The Prompting Decision Transformer (PDT) enables large-scale, multi-task offline Reinforcement Learning (RL)…

Machine Learning · Computer Science 2025-07-21 Finn Rietz , Oleg Smirnov , Sara Karimi , Lele Cao

Bayesian Design Principles for Frequentist Sequential Learning

We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization…

Machine Learning · Computer Science 2024-02-12 Yunbei Xu , Assaf Zeevi

Design Principles of Robust Multi-Armed Bandit Framework in Video Recommendations

Current multi-armed bandit approaches in recommender systems (RS) have focused more on devising effective exploration techniques, while not adequately addressing common exploitation challenges related to distributional changes and item…

Information Retrieval · Computer Science 2023-10-04 Belhassen Bayar , Phanideep Gampa , Ainur Yessenalina , Zhen Wen

Contextual Bandits for adapting to changing User preferences over time

Contextual bandits provide an effective way to model the dynamic data problem in ML by leveraging online (incremental) learning to continuously adjust the predictions based on changing environment. We explore details on contextual bandits,…

Machine Learning · Computer Science 2020-09-24 Dattaraj Rao

LLMs-augmented Contextual Bandit

Contextual bandits have emerged as a cornerstone in reinforcement learning, enabling systems to make decisions with partial feedback. However, as contexts grow in complexity, traditional bandit algorithms can face challenges in adequately…

Machine Learning · Computer Science 2023-11-07 Ali Baheri , Cecilia O. Alm

Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook

In recent years, reinforcement learning and bandits have transformed a wide range of real-world applications including healthcare, finance, recommendation systems, robotics, and last but not least, the speech and natural language…

Artificial Intelligence · Computer Science 2023-10-20 Baihan Lin

Sequential Batch Learning in Finite-Action Linear Contextual Bandits

We study the sequential batch learning problem in linear contextual bandits with finite action sets, where the decision maker is constrained to split incoming individuals into (at most) a fixed number of batches and can only observe…

Machine Learning · Computer Science 2020-04-15 Yanjun Han , Zhengqing Zhou , Zhengyuan Zhou , Jose Blanchet , Peter W. Glynn , Yinyu Ye

Reinforcement learning with combinatorial actions for coupled restless bandits

Reinforcement learning (RL) has increasingly been applied to solve real-world planning problems, with progress in handling large state spaces and time horizons. However, a key bottleneck in many domains is that RL methods cannot accommodate…

Machine Learning · Computer Science 2025-03-19 Lily Xu , Bryan Wilder , Elias B. Khalil , Milind Tambe

On Learning to Rank Long Sequences with Contextual Bandits

Motivated by problems of learning to rank long item sequences, we introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. We formulate two generative models for this…

Machine Learning · Computer Science 2022-09-05 Anirban Santara , Claudio Gentile , Gaurav Aggarwal , Shuai Li

Fitting Reinforcement Learning Model to Behavioral Data under Bandits

We consider the problem of fitting a reinforcement learning (RL) model to some given behavioral data under a multi-armed bandit environment. These models have received much attention in recent years for characterizing human and animal…

Computational Engineering, Finance, and Science · Computer Science 2026-03-27 Hao Zhu , Jasper Hoffmann , Baohe Zhang , Joschka Boedecker

Survey: Multi-Armed Bandits Meet Large Language Models

Bandit algorithms and Large Language Models (LLMs) have emerged as powerful tools in artificial intelligence, each addressing distinct yet complementary challenges in decision-making and natural language processing. This survey explores the…

Artificial Intelligence · Computer Science 2025-10-01 Djallel Bouneffouf , Raphael Feraud

When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution

Collaborative bandit learning, i.e., bandit algorithms that utilize collaborative filtering techniques to improve sample efficiency in online interactive recommendation, has attracted much research attention as it enjoys the best of both…

Machine Learning · Computer Science 2021-04-16 Chuanhao Li , Qingyun Wu , Hongning Wang