Related papers: Adapting Behaviour for Learning Progress

Dynamic Memory for Interpretable Sequential Optimisation

Real-world applications of reinforcement learning for recommendation and experimentation faces a practical challenge: the relative reward of different bandit arms can evolve over the lifetime of the learning agent. To deal with these…

Machine Learning · Computer Science 2022-06-29 Srivas Chennu , Andrew Maher , Jamie Martin , Subash Prabanantham

Adaptive Exploration for Latent-State Bandits

The multi-armed bandit problem is a core framework for sequential decision-making under uncertainty, but classical algorithms often fail in environments with hidden, time-varying states that confound reward estimation and optimal action…

Machine Learning · Computer Science 2026-02-19 Jikai Jin , Kenneth Hung , Sanath Kumar Krishnamurthy , Baoyi Shi , Congshan Zhang

Evolution of Information in Interactive Decision Making: A Case Study for Multi-Armed Bandits

We study the evolution of information in interactive decision making through the lens of a stochastic multi-armed bandit problem. Focusing on a fundamental example where a unique optimal arm outperforms the rest by a fixed margin, we…

Machine Learning · Statistics 2025-10-23 Yuzhou Gu , Yanjun Han , Jian Qian

Reinforcement Learning in Education: A Multi-Armed Bandit Approach

Advances in reinforcement learning research have demonstrated the ways in which different agent-based models can learn how to optimally perform a task within a given environment. Reinforcement leaning solves unsupervised problems where…

Machine Learning · Computer Science 2022-11-03 Herkulaas Combrink , Vukosi Marivate , Benjamin Rosman

A Bandit Framework for Optimal Selection of Reinforcement Learning Agents

Deep Reinforcement Learning has been shown to be very successful in complex games, e.g. Atari or Go. These games have clearly defined rules, and hence allow simulation. In many practical applications, however, interactions with the…

Machine Learning · Computer Science 2019-02-12 Andreas Merentitis , Kashif Rasul , Roland Vollgraf , Abdul-Saboor Sheikh , Urs Bergmann

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

We introduce the "inverse bandit" problem of estimating the rewards of a multi-armed bandit instance from observing the learning process of a low-regret demonstrator. Existing approaches to the related problem of inverse reinforcement…

Machine Learning · Statistics 2022-02-23 Wenshuo Guo , Kumar Krishna Agrawal , Aditya Grover , Vidya Muthukumar , Ashwin Pananjady

Optimal Adaptive Learning in Uncontrolled Restless Bandit Problems

In this paper we consider the problem of learning the optimal policy for uncontrolled restless bandit problems. In an uncontrolled restless bandit problem, there is a finite set of arms, each of which when pulled yields a positive reward.…

Optimization and Control · Mathematics 2015-01-30 Cem Tekin , Mingyan Liu

An Information-Theoretic Analysis of Nonstationary Bandit Learning

In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes…

Machine Learning · Computer Science 2023-12-27 Seungki Min , Daniel Russo

Active Learning with Safety Constraints

Active learning methods have shown great promise in reducing the number of samples necessary for learning. As automated learning systems are adopted into real-time, real-world decision-making pipelines, it is increasingly important that…

Machine Learning · Computer Science 2022-06-23 Romain Camilleri , Andrew Wagenmaker , Jamie Morgenstern , Lalit Jain , Kevin Jamieson

Resource Allocation in Multi-armed Bandit Exploration: Overcoming Sublinear Scaling with Adaptive Parallelism

We study exploration in stochastic multi-armed bandits when we have access to a divisible resource that can be allocated in varying amounts to arm pulls. We focus in particular on the allocation of distributed computing resources, where we…

Machine Learning · Computer Science 2021-06-08 Brijen Thananjeyan , Kirthevasan Kandasamy , Ion Stoica , Michael I. Jordan , Ken Goldberg , Joseph E. Gonzalez

Machine Teaching of Active Sequential Learners

Machine teaching addresses the problem of finding the best training data that can guide a learning algorithm to a target model with minimal effort. In conventional settings, a teacher provides data that are consistent with the true data…

Machine Learning · Computer Science 2019-11-04 Tomi Peltola , Mustafa Mert Çelikok , Pedram Daee , Samuel Kaski

Learning Contextual Bandits in a Non-stationary Environment

Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually…

Machine Learning · Computer Science 2018-05-25 Qingyun Wu , Naveen Iyer , Hongning Wang

Multi-Armed Bandits for Intelligent Tutoring Systems

We present an approach to Intelligent Tutoring Systems which adaptively personalizes sequences of learning activities to maximize skills acquired by students, taking into account the limited time and motivational resources. At a given point…

Artificial Intelligence · Computer Science 2019-07-17 Benjamin Clement , Didier Roy , Pierre-Yves Oudeyer , Manuel Lopes

A New Bandit Setting Balancing Information from State Evolution and Corrupted Context

We propose a new sequential decision-making setting, combining key aspects of two established online learning problems with bandit feedback. The optimal action to play at any given moment is contingent on an underlying changing state which…

Machine Learning · Computer Science 2023-11-07 Alexander Galozy , Slawomir Nowaczyk , Mattias Ohlsson

An Adaptive Method for Contextual Stochastic Multi-armed Bandits with Rewards Generated by a Linear Dynamical System

Online decision-making can be formulated as the popular stochastic multi-armed bandit problem where a learner makes decisions (or takes actions) to maximize cumulative rewards collected from an unknown environment. This paper proposes to…

Systems and Control · Electrical Eng. & Systems 2025-11-26 Jonathan Gornet , Mehdi Hosseinzadeh , Bruno Sinopoli

The Assistive Multi-Armed Bandit

Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science. However, most work makes the assumption that humans are acting (noisily) optimally with respect to their preferences.…

Machine Learning · Computer Science 2019-01-28 Lawrence Chan , Dylan Hadfield-Menell , Siddhartha Srinivasa , Anca Dragan

Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design

Motivated by practical needs such as large-scale learning, we study the impact of adaptivity constraints to linear contextual bandits, a central problem in online active learning. We consider two popular limited adaptivity models in…

Machine Learning · Computer Science 2021-04-26 Yufei Ruan , Jiaqi Yang , Yuan Zhou

Fractional Moments on Bandit Problems

Reinforcement learning addresses the dilemma between exploration to find profitable actions and exploitation to act according to the best observations already made. Bandit problems are one such class of problems in stateless environments…

Machine Learning · Computer Science 2012-02-20 Ananda Narayanan B , Balaraman Ravindran

Satisficing Exploration for Deep Reinforcement Learning

A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world,…

Machine Learning · Computer Science 2024-07-23 Dilip Arumugam , Saurabh Kumar , Ramki Gummadi , Benjamin Van Roy

Reinforcement Learning for Machine Learning Model Deployment: Evaluating Multi-Armed Bandits in ML Ops Environments

In modern ML Ops environments, model deployment is a critical process that traditionally relies on static heuristics such as validation error comparisons and A/B testing. However, these methods require human intervention to adapt to…

Machine Learning · Computer Science 2025-03-31 S. Aaron McClendon , Vishaal Venkatesh , Juan Morinelli