Related papers: Contextual Blocking Bandits

Combinatorial Blocking Bandits with Stochastic Delays

Recent work has considered natural variations of the multi-armed bandit problem, where the reward distribution of each arm is a special function of the time passed since its last pulling. In this direction, a simple (yet widely applicable)…

Machine Learning · Computer Science 2021-05-25 Alexia Atsidakou , Orestis Papadigenopoulos , Soumya Basu , Constantine Caramanis , Sanjay Shakkottai

Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. Two models with disjoint and hybrid payoffs are considered to…

Machine Learning · Computer Science 2020-03-03 Xiao Xu , Fang Dong , Yanghua Li , Shaojian He , Xin Li

Adaptive Bandit Algorithms for Contextual Matching Markets

We study bandit learning in matching markets, where players and arms constitute the two market sides, and the players' utilities are linear in the arm contexts. In each round, new arms arrive with observable contexts. Then, the algorithm…

Machine Learning · Computer Science 2026-05-28 Shiyun Lin , Simon Mauras , Vianney Perchet , Nadav Merlis

Blocking Bandits

We consider a novel stochastic multi-armed bandit setting, where playing an arm makes it unavailable for a fixed number of time slots thereafter. This models situations where reusing an arm too often is undesirable (e.g. making the same…

Machine Learning · Computer Science 2024-07-31 Soumya Basu , Rajat Sen , Sujay Sanghavi , Sanjay Shakkottai

Contextual Bandits with Similarity Information

In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices. In each round it chooses from a time-invariant set of alternatives and receives the payoff associated with this alternative. While the case of small…

Data Structures and Algorithms · Computer Science 2014-05-21 Aleksandrs Slivkins

Multi-Armed Bandits with Minimum Aggregated Revenue Constraints

We examine a multi-armed bandit problem with contextual information, where the objective is to ensure that each arm receives a minimum aggregated reward across contexts while simultaneously maximizing the total cumulative reward. This…

Machine Learning · Computer Science 2025-10-15 Ahmed Ben Yahmed , Hafedh El Ferchichi , Marc Abeille , Vianney Perchet

Multi-Armed Bandits with Censored Consumption of Resources

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of…

Machine Learning · Computer Science 2022-10-18 Viktor Bengs , Eyke Hüllermeier

Risk-averse Contextual Multi-armed Bandit Problem with Linear Payoffs

In this paper we consider the contextual multi-armed bandit problem for linear payoffs under a risk-averse criterion. At each round, contexts are revealed for each arm, and the decision maker chooses one arm to pull and receives the…

Machine Learning · Computer Science 2022-06-28 Yifan Lin , Yuhao Wang , Enlu Zhou

Learning Contextual Bandits in a Non-stationary Environment

Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually…

Machine Learning · Computer Science 2018-05-25 Qingyun Wu , Naveen Iyer , Hongning Wang

Stochastic Contextual Bandits with Known Reward Functions

Many sequential decision-making problems in communication networks can be modeled as contextual bandit problems, which are natural extensions of the well-known multi-armed bandit problem. In contextual bandit problems, at each time, an…

Machine Learning · Computer Science 2016-05-10 Pranav Sakulkar , Bhaskar Krishnamachari

Multi-Task Learning for Contextual Bandits

Contextual bandits are a form of multi-armed bandit in which the agent has access to predictive side information (known as the context) for each arm at each time step, and have been used to model personalized news recommendation, ad…

Machine Learning · Statistics 2017-05-25 Aniket Anand Deshmukh , Urun Dogan , Clayton Scott

BanditQ: Fair Bandits with Guaranteed Rewards

Classic no-regret multi-armed bandit algorithms, including the Upper Confidence Bound (UCB), Hedge, and EXP3, are inherently unfair by design. Their unfairness stems from their objective of playing the most rewarding arm as frequently as…

Machine Learning · Computer Science 2024-05-14 Abhishek Sinha

Congested Bandits: Optimal Routing via Short-term Resets

For traffic routing platforms, the choice of which route to recommend to a user depends on the congestion on these routes -- indeed, an individual's utility depends on the number of people using the recommended route at that instance.…

Machine Learning · Computer Science 2023-01-24 Pranjal Awasthi , Kush Bhatia , Sreenivas Gollapudi , Kostas Kollias

Strategic Multi-Armed Bandit Problems Under Debt-Free Reporting

We consider the classical multi-armed bandit problem, but with strategic arms. In this context, each arm is characterized by a bounded support reward distribution and strategically aims to maximize its own utility by potentially retaining a…

Machine Learning · Computer Science 2025-01-28 Ahmed Ben Yahmed , Clément Calauzènes , Vianney Perchet

Contextual Bandit with Missing Rewards

We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the reward associated with each context-based decision may not always be…

Machine Learning · Computer Science 2020-07-21 Djallel Bouneffouf , Sohini Upadhyay , Yasaman Khazaeni

Neural Dueling Bandits: Preference-Based Optimization with Human Feedback

Contextual dueling bandit is used to model the bandit problems, where a learner's goal is to find the best arm for a given context using observed noisy human preference feedback over the selected arms for the past contexts. However,…

Machine Learning · Computer Science 2025-04-17 Arun Verma , Zhongxiang Dai , Xiaoqiang Lin , Patrick Jaillet , Bryan Kian Hsiang Low

Multiplayer Information Asymmetric Contextual Bandits

Single-player contextual bandits are a well-studied problem in reinforcement learning that has seen applications in various fields such as advertising, healthcare, and finance. In light of the recent work on \emph{information asymmetric}…

Machine Learning · Computer Science 2025-03-13 William Chang , Yuanhao Lu

Contextual Bandits with Cross-learning

In the classical contextual bandits problem, in each round $t$, a learner observes some context $c$, chooses some action $i$ to perform, and receives some reward $r_{i,t}(c)$. We consider the variant of this problem where in addition to…

Machine Learning · Computer Science 2021-11-17 Santiago Balseiro , Negin Golrezaei , Mohammad Mahdian , Vahab Mirrokni , Jon Schneider

Best-of-Both-Worlds Linear Contextual Bandits

This study investigates the problem of $K$-armed linear contextual bandits, an instance of the multi-armed bandit problem, under an adversarial corruption. At each round, a decision-maker observes an independent and identically distributed…

Machine Learning · Computer Science 2023-12-29 Masahiro Kato , Shinji Ito

Regret Minimization in Stochastic Contextual Dueling Bandits

We consider the problem of stochastic $K$-armed dueling bandit in the contextual setting, where at each round the learner is presented with a context set of $K$ items, each represented by a $d$-dimensional feature vector, and the goal of…

Machine Learning · Computer Science 2021-05-11 Aadirupa Saha , Aditya Gopalan