English
Related papers

Related papers: Meta-Thompson Sampling

200 papers

Meta-learning is characterized by its ability to learn how to learn, enabling the adaptation of learning strategies across different tasks. Recent research introduced the Meta-Thompson Sampling (Meta-TS), which meta-learns an unknown prior…

Machine Learning · Statistics 2024-09-12 Hao Li , Dong Liang , Zheng Xie

We propose ${\tt AdaTS}$, a Thompson sampling algorithm that adapts sequentially to bandit tasks that it interacts with. The key idea in ${\tt AdaTS}$ is to adapt to an unknown task prior distribution by maintaining a distribution over its…

Machine Learning · Computer Science 2022-02-28 Soumya Basu , Branislav Kveton , Manzil Zaheer , Csaba Szepesvári

Fully Bayesian approaches to sequential decision-making assume that problem parameters are generated from a known prior. In practice, such information is often lacking. This problem is exacerbated in setups with partial information, where a…

Machine Learning · Statistics 2022-08-08 Amit Peleg , Naama Pearl , Ron Meir

How to explore efficiently is a central problem in multi-armed bandits. In this paper, we introduce the metadata-based multi-task bandit problem, where the agent needs to solve a large number of related multi-armed bandit tasks and can…

Machine Learning · Computer Science 2021-08-17 Runzhe Wan , Lin Ge , Rui Song

Online learning in large-scale structured bandits is known to be challenging due to the curse of dimensionality. In this paper, we propose a unified meta-learning framework for a general class of structured bandit problems where the…

Machine Learning · Computer Science 2022-03-01 Runzhe Wan , Lin Ge , Rui Song

We study the problem of meta-learning several contextual stochastic bandits tasks by leveraging their concentration around a low-dimensional affine subspace, which we learn via online principal component analysis to reduce the expected…

Machine Learning · Computer Science 2024-04-02 Steven Bilaj , Sofien Dhouib , Setareh Maghsudi

Recent advances in deep reinforcement learning have made significant strides in performance on applications such as Go and Atari games. However, developing practical methods to balance exploration and exploitation in complex domains remains…

Machine Learning · Statistics 2018-02-27 Carlos Riquelme , George Tucker , Jasper Snoek

Meta-, multi-task, and federated learning can be all viewed as solving similar tasks, drawn from a distribution that reflects task similarities. We provide a unified view of all these problems, as learning to act in a hierarchical Bayesian…

Machine Learning · Computer Science 2022-03-08 Joey Hong , Branislav Kveton , Manzil Zaheer , Mohammad Ghavamzadeh

Much of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action. One shortcoming is that this orientation does not account for time sensitivity, which can play a crucial role when learning an…

Machine Learning · Computer Science 2020-01-09 Daniel Russo , Benjamin Van Roy

The literature on bandit learning and regret analysis has focused on contexts where the goal is to converge on an optimal action in a manner that limits exploration costs. One shortcoming imposed by this orientation is that it does not…

Machine Learning · Computer Science 2017-05-01 Daniel Russo , David Tse , Benjamin Van Roy

Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to non-stationary environments. We attribute such failures to the…

Machine Learning · Computer Science 2025-05-06 Yueyang Liu , Xu Kuang , Benjamin Van Roy

We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a learning agent interacts with a sequence of bandit tasks, which are sampled i.i.d.\ from an unknown prior distribution, and learns its…

Machine Learning · Computer Science 2023-07-06 Mohammadjavad Azizi , Branislav Kveton , Mohammad Ghavamzadeh , Sumeet Katariya

We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution. This is relevant in multi-task learning, where a learning agent faces different classes of problems. We…

Machine Learning · Computer Science 2022-03-08 Joey Hong , Branislav Kveton , Manzil Zaheer , Mohammad Ghavamzadeh , Craig Boutilier

This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems. The algorithm, also known as Thompson Sampling,…

Machine Learning · Computer Science 2014-02-04 Daniel Russo , Benjamin Van Roy

Contextual multi-armed bandits are classical models in reinforcement learning for sequential decision-making associated with individual information. A widely-used policy for bandits is Thompson Sampling, where samples from a data-driven…

Machine Learning · Statistics 2021-11-30 Hongju Park , Mohamad Kazem Shirani Faradonbeh

In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset…

Machine Learning · Computer Science 2020-10-20 Djallel Bouneffouf , Raphaël Féraud , Sohini Upadhyay , Yasaman Khazaeni , Irina Rish

The problem of two-sided matching markets has a wide range of real-world applications and has been extensively studied in the literature. A line of recent works have focused on the problem setting where the preferences of one-side market…

Machine Learning · Computer Science 2022-05-03 Fang Kong , Junming Yin , Shuai Li

A contextual bandit is a popular framework for online learning to act under uncertainty. In practice, the number of actions is huge and their expected rewards are correlated. In this work, we introduce a general framework for capturing such…

Machine Learning · Computer Science 2023-03-07 Imad Aouali , Branislav Kveton , Sumeet Katariya

Thompson Sampling, one of the oldest heuristics for solving multi-armed bandits, has recently been shown to demonstrate state-of-the-art performance. The empirical success has led to great interests in theoretical understanding of this…

Machine Learning · Computer Science 2013-10-29 Lihong Li

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson Sampling, which adapts deep neural networks for both…

Machine Learning · Computer Science 2022-01-03 Weitong Zhang , Dongruo Zhou , Lihong Li , Quanquan Gu
‹ Prev 1 2 3 10 Next ›