Related papers: RELEAF: An Algorithm for Learning and Exploiting R…

Exploiting Relevance for Online Decision-Making in High-Dimensions

Many sequential decision-making tasks require choosing at each decision step the right action out of the vast set of possibilities by extracting actionable intelligence from high-dimensional data streams. Most of the times, the…

Machine Learning · Computer Science 2020-12-29 Eralp Turgay , Cem Bulucu , Cem Tekin

Adaptively Learning to Select-Rank in Online Platforms

Ranking algorithms are fundamental to various online platforms across e-commerce sites to content streaming services. Our research addresses the challenge of adaptively ranking items from a candidate pool for heterogeneous users, a key…

Machine Learning · Computer Science 2024-06-10 Jingyuan Wang , Perry Dong , Ying Jin , Ruohan Zhan , Zhengyuan Zhou

Bayesian Non-stationary Linear Bandits for Large-Scale Recommender Systems

Taking advantage of contextual information can potentially boost the performance of recommender systems. In the era of big data, such side information often has several dimensions. Thus, developing decision-making algorithms to cope with…

Machine Learning · Computer Science 2023-07-26 Saeed Ghoorchian , Evgenii Kortukov , Setareh Maghsudi

Contextual Bandits and Imitation Learning via Preference-Based Active Queries

We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward. Instead, the learner can actively query an expert at each round to compare two actions and…

Machine Learning · Computer Science 2023-07-25 Ayush Sekhari , Karthik Sridharan , Wen Sun , Runzhe Wu

Model selection for contextual bandits

We introduce the problem of model selection for contextual bandits, where a learner must adapt to the complexity of the optimal policy while balancing exploration and exploitation. Our main result is a new model selection guarantee for…

Machine Learning · Computer Science 2019-11-15 Dylan J. Foster , Akshay Krishnamurthy , Haipeng Luo

Recycling History: Efficient Recommendations from Contextual Dueling Bandits

The contextual duelling bandit problem models adaptive recommender systems, where the algorithm presents a set of items to the user, and the user's choice reveals their preference. This setup is well suited for implicit choices users make…

Machine Learning · Computer Science 2025-08-27 Suryanarayana Sankagiri , Jalal Etesami , Pouria Fatemi , Matthias Grossglauser

Contextual Recommendations and Low-Regret Cutting-Plane Algorithms

We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems. We wish to learn a hidden $d$-dimensional value $w^*$. Every round, we are presented with a…

Machine Learning · Computer Science 2021-06-10 Sreenivas Gollapudi , Guru Guruganesh , Kostas Kollias , Pasin Manurangsi , Renato Paes Leme , Jon Schneider

Dimension Reduction in Contextual Online Learning via Nonparametric Variable Selection

We consider a contextual online learning (multi-armed bandit) problem with high-dimensional covariate $\mathbf{x}$ and decision $\mathbf{y}$. The reward function to learn, $f(\mathbf{x},\mathbf{y})$, does not have a particular parametric…

Machine Learning · Computer Science 2022-10-04 Wenhao Li , Ningyuan Chen , L. Jeff Hong

Efficient Algorithms for Adversarial Contextual Learning

We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the…

Machine Learning · Computer Science 2016-02-09 Vasilis Syrgkanis , Akshay Krishnamurthy , Robert E. Schapire

Instance-Dependent Complexity of Contextual Bandits and Reinforcement Learning: A Disagreement-Based Perspective

In the classical multi-armed bandit problem, instance-dependent algorithms attain improved performance on "easy" problems with a gap between the best and second-best arm. Are similar guarantees possible for contextual bandits? While…

Machine Learning · Computer Science 2020-10-08 Dylan J. Foster , Alexander Rakhlin , David Simchi-Levi , Yunzong Xu

Misalignment, Learning, and Ranking: Harnessing Users Limited Attention

In digital health and EdTech, recommendation systems face a significant challenge: users often choose impulsively, in ways that conflict with the platform's long-term payoffs. This misalignment makes it difficult to effectively learn to…

Machine Learning · Computer Science 2024-02-22 Arpit Agarwal , Rad Niazadeh , Prathamesh Patil

Recommending with Recommendations

Recommendation systems are a key modern application of machine learning, but they have the downside that they often draw upon sensitive user information in making their predictions. We show how to address this deficiency by basing a…

Machine Learning · Computer Science 2021-12-03 Naveen Durvasula , Franklyn Wang , Scott Duke Kominers

Contextual Bandit Learning with Predictable Rewards

Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context. We consider this problem under a…

Machine Learning · Computer Science 2012-03-05 Alekh Agarwal , Miroslav Dudík , Satyen Kale , John Langford , Robert E. Schapire

Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability

We study the $K$-armed contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes \emph{preference-based feedback} suggesting that one…

Machine Learning · Computer Science 2021-11-25 Aadirupa Saha , Akshay Krishnamurthy

Learning Neural Contextual Bandits Through Perturbed Rewards

Thanks to the power of representation learning, neural contextual bandit algorithms demonstrate remarkable performance improvement against their classical counterparts. But because their exploration has to be performed in the entire neural…

Machine Learning · Computer Science 2022-03-22 Yiling Jia , Weitong Zhang , Dongruo Zhou , Quanquan Gu , Hongning Wang

Online Model Selection for Reinforcement Learning with Function Approximation

Deep reinforcement learning has achieved impressive successes yet often requires a very large amount of interaction data. This result is perhaps unsurprising, as using complicated function approximation often requires more data to fit, and…

Machine Learning · Computer Science 2020-11-20 Jonathan N. Lee , Aldo Pacchiano , Vidya Muthukumar , Weihao Kong , Emma Brunskill

Leveraging Good Representations in Linear Contextual Bandits

The linear contextual bandit literature is mostly focused on the design of efficient learning algorithms for a given representation. However, a contextual bandit problem may admit multiple linear representations, each one with different…

Machine Learning · Computer Science 2021-04-09 Matteo Papini , Andrea Tirinzoni , Marcello Restelli , Alessandro Lazaric , Matteo Pirotta

High-dimensional Nonparametric Contextual Bandit Problem

We consider the kernelized contextual bandit problem with a large feature space. This problem involves $K$ arms, and the goal of the forecaster is to maximize the cumulative rewards through learning the relationship between the contexts and…

Machine Learning · Statistics 2025-05-21 Shogo Iwazaki , Junpei Komiyama , Masaaki Imaizumi

Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces

Designing efficient general-purpose contextual bandit algorithms that work with large -- or even continuous -- action spaces would facilitate application to important scenarios such as information retrieval, recommendation systems, and…

Machine Learning · Computer Science 2022-07-14 Yinglun Zhu , Paul Mineiro

Doubly-Robust Lasso Bandit

Contextual multi-armed bandit algorithms are widely used in sequential decision tasks such as news article recommendation systems, web page ad placement algorithms, and mobile health. Most of the existing algorithms have regret proportional…

Machine Learning · Statistics 2020-02-14 Gi-Soo Kim , Myunghee Cho Paik