Related papers: Adaptive Data Augmentation with Multi-armed Bandit…

Adam with Bandit Sampling for Deep Learning

Adam is a widely used optimization method for training deep learning models. It computes individual adaptive learning rates for different parameters. In this paper, we propose a generalization of Adam, called Adambs, that allows us to also…

Machine Learning · Computer Science 2020-10-27 Rui Liu , Tianyi Wu , Barzan Mozafari

MAB Optimizer for Estimating Math Question Difficulty via Inverse CV without NLP

The evolution of technology and education is driving the emergence of Intelligent & Autonomous Tutoring Systems (IATS), where objective and domain-agnostic methods for determining question difficulty are essential. Traditional human…

Artificial Intelligence · Computer Science 2025-09-03 Surajit Das , Gourav Roy , Aleksei Eliseev , Ram Kumar Rajendran

ADAM Optimization with Adaptive Batch Selection

Adam is a widely used optimizer in neural network training due to its adaptive learning rate. However, because different data samples influence model updates to varying degrees, treating them equally can lead to inefficient convergence. To…

Machine Learning · Statistics 2025-12-09 Gyu Yeol Kim , Min-hwan Oh

Valid Best-Model Identification for LLM Evaluation via Low-Rank Factorization

Selecting the best large language model (LLM) for a fixed benchmark is often expensive, since exhaustive evaluation requires running every model on every example. Multi-armed bandit (MAB) algorithms can reduce the number of LLM calls by…

Machine Learning · Computer Science 2026-05-12 Elad Tolochinsky , Yaniv Tenzer , Yaniv Romano

Integrating Multi-Armed Bandit, Active Learning, and Distributed Computing for Scalable Optimization

Modern optimization problems in scientific and engineering domains often rely on expensive black-box evaluations, such as those arising in physical simulations or deep learning pipelines, where gradient information is unavailable or…

Computation · Statistics 2026-01-05 Foo Hui-Mean , Yuan-chin Ivan Chang

Adaptive Data Exploitation in Deep Reinforcement Learning

We introduce ADEPT: Adaptive Data ExPloiTation, a simple yet powerful framework to enhance the **data efficiency** and **generalization** in deep reinforcement learning (RL). Specifically, ADEPT adaptively manages the use of sampled data…

Machine Learning · Computer Science 2025-01-23 Mingqi Yuan , Bo Li , Xin Jin , Wenjun Zeng

A Contextual Bandits Approach for Personalization of Hand Gesture Recognition

In human-computer interaction applications like hand gesture recognition, supervised learning models are often trained on a large population of users to achieve high task accuracy. However, due to individual variability in sensor signals…

Human-Computer Interaction · Computer Science 2025-09-12 Duke Lin , Michael Paskett , Ying Yang

Statistical Inference on Multi-armed Bandits with Delayed Feedback

Multi armed bandit (MAB) algorithms have been increasingly used to complement or integrate with A/B tests and randomized clinical trials in e-commerce, healthcare, and policymaking. Recent developments incorporate possible delayed feedback.…

Methodology · Statistics 2023-07-04 Lei Shi , Jingshen Wang , Tianhao Wu

Beyond Static Bias: Adaptive Multi-Fidelity Bandits with Improving Proxies

As an extension of the classical multi-armed bandit problem, multi-fidelity multi-armed bandits (MF-MAB) enable individual arms to be evaluated using diverse feedback sources that vary in both cost and accuracy. Prior stochastic models…

Machine Learning · Computer Science 2026-05-12 Muyun Lu , Haoyang Hong , Huazheng Wang , Ying Lin

AdaptiveBandit: A multi-armed bandit framework for adaptive sampling in molecular simulations

Sampling from the equilibrium distribution has always been a major problem in molecular simulations due to the very high dimensionality of conformational space. Over several decades, many approaches have been used to overcome the problem.…

Computational Physics · Physics 2020-03-02 Adrià Pérez , Pablo Herrera-Nieto , Stefan Doerr , Gianni De Fabritiis

Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit

Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions. We consider a scenario where…

Machine Learning · Statistics 2019-01-25 Yang Cao , Zheng Wen , Branislav Kveton , Yao Xie

When Do We Need LLMs? A Diagnostic for Language-Driven Bandits

We study Contextual Multi-Armed Bandits (CMABs) for non-episodic sequential decision making problems where the context includes both textual and numerical information (e.g., recommendation systems, dynamic portfolio adjustments, offer…

Artificial Intelligence · Computer Science 2026-04-08 Uljad Berdica , Fernando Acero , Anton Ipsen , Parisa Zehtabi , Michael Cashmore , Manuela Veloso

Adaptively Optimize Content Recommendation Using Multi Armed Bandit Algorithms in E-commerce

E-commerce sites strive to provide users the most timely relevant information in order to reduce shopping frictions and increase customer satisfaction. Multi armed bandit models (MAB) as a type of adaptive optimization algorithms provide…

Information Retrieval · Computer Science 2021-08-23 Ding Xiang , Becky West , Jiaqi Wang , Xiquan Cui , Jinzhou Huang

An Incentive Compatible Multi-Armed-Bandit Crowdsourcing Mechanism with Quality Assurance

Consider a requester who wishes to crowdsource a series of identical binary labeling tasks to a pool of workers so as to achieve an assured accuracy for each task, in a cost optimal way. The workers are heterogeneous with unknown but fixed…

Computer Science and Game Theory · Computer Science 2015-06-18 Shweta Jain , Sujit Gujar , Satyanath Bhat , Onno Zoeter , Y. Narahari

Valid Post-Contextual Bandit Inference

We establish an asymptotic framework for the statistical analysis of the stochastic contextual multi-armed bandit problem (CMAB), which is widely employed in adaptively randomized experiments across various fields. While algorithms for…

Econometrics · Economics 2025-05-21 Ramon van den Akker , Bas J. M. Werker , Bo Zhou

Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback

Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed Bandits (COM-MAB) show good results on a global accuracy metric. This can be achieved, in the case of recommender systems, with personalization. However, with a…

Machine Learning · Computer Science 2020-09-17 Alexandre Letard , Tassadit Amghar , Olivier Camp , Nicolas Gutowski

Offline Learning for Combinatorial Multi-armed Bandits

The combinatorial multi-armed bandit (CMAB) is a fundamental sequential decision-making framework, extensively studied over the past decade. However, existing work primarily focuses on the online setting, overlooking the substantial costs…

Machine Learning · Computer Science 2025-05-30 Xutong Liu , Xiangxiang Dai , Jinhang Zuo , Siwei Wang , Carlee Joe-Wong , John C. S. Lui , Wei Chen

Solving Multi-Arm Bandit Using a Few Bits of Communication

The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks,…

Machine Learning · Computer Science 2021-11-12 Osama A. Hanna , Lin F. Yang , Christina Fragouli

A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback

We investigate the problem of stochastic, combinatorial multi-armed bandits where the learner only has access to bandit feedback and the reward function can be non-linear. We provide a general framework for adapting discrete offline…

Machine Learning · Computer Science 2023-10-13 Guanyu Nie , Yididiya Y Nadew , Yanhui Zhu , Vaneet Aggarwal , Christopher John Quinn

Functional multi-armed bandit and the best function identification problems

Bandit optimization usually refers to the class of online optimization problems with limited feedback, namely, a decision maker uses only the objective value at the current point to make a new decision and does not have access to the…

Machine Learning · Computer Science 2026-02-18 Yuriy Dorn , Aleksandr Katrutsa , Ilgam Latypov , Anastasiia Soboleva