相关论文: Adaptive Policy Learning Under Unknown Network Int…

Scalable Policy Maximization Under Network Interference

Many interventions, such as vaccines in clinical trials or coupons in online marketplaces, must be assigned sequentially without full knowledge of their effects. Multi-armed bandit algorithms have proven successful in such settings.…

机器学习 · 统计学 2026-05-07 Aidan Gleich , Eric Laber , Alexander Volfovsky

Learning to target with network interference

This paper studies adaptive targeting under network interference in a bandit setting, where treatments applied to one individual may affect others through spillover effects. We consider a linear model in a sparse regime, where each…

机器学习 · 统计学 2026-05-28 Xiaomeng Wang , Hamsa Bastani , Osbert Bastani , Zhimei Ren

Learning to Optimize Via Posterior Sampling

This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems. The algorithm, also known as Thompson Sampling,…

机器学习 · 计算机科学 2014-02-04 Daniel Russo , Benjamin Van Roy

Scalable regret for learning to control network-coupled subsystems with unknown dynamics

We consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e. loss in performance) of our strategy…

系统与控制 · 电气工程与系统科学 2021-08-19 Sagar Sudhakara , Aditya Mahajan , Ashutosh Nayyar , Yi Ouyang

Policy Targeting under Network Interference

This paper studies the problem of optimally allocating treatments in the presence of spillover effects, using information from a (quasi-)experiment. I introduce a method that maximizes the sample analog of average social welfare when…

计量经济学 · 经济学 2024-04-09 Davide Viviano

On Evolution-Based Models for Experimentation Under Interference

Causal effect estimation in networked systems is central to data-driven decision making. In such settings, interventions on one unit can spill over to others, and in complex physical or social systems, the interaction pathways driving these…

机器学习 · 统计学 2025-11-27 Sadegh Shirani , Mohsen Bayati

Adaptive Combinatorial Allocation

We consider settings where an allocation has to be chosen repeatedly, returns are unknown but can be learned, and decisions are subject to constraints. Our model covers two-sided and one-sided matching, even with complex constraints. We…

计量经济学 · 经济学 2020-11-05 Maximilian Kasy , Alexander Teytelboym

Online Experimental Design With Estimation-Regret Trade-off Under Network Interference

Network interference has attracted significant attention in the field of causal inference, encapsulating various sociological behaviors where the treatment assigned to one individual within a network may affect the outcomes of others, such…

机器学习 · 计算机科学 2025-02-11 Zhiheng Zhang , Zichen Wang

Thompson Sampling in Non-Episodic Restless Bandits

Restless bandit problems assume time-varying reward distributions of the arms, which adds flexibility to the model but makes the analysis more challenging. We study learning algorithms over the unknown reward distributions and prove a…

机器学习 · 计算机科学 2019-10-15 Young Hun Jung , Marc Abeille , Ambuj Tewari

Individualized Policy Evaluation and Learning under Clustered Network Interference

Although there is now a large literature on policy evaluation and learning, much of the prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference can lead…

统计方法学 · 统计学 2025-04-02 Yi Zhang , Kosuke Imai

On Adaptive Linear-Quadratic Regulators

Performance of adaptive control policies is assessed through the regret with respect to the optimal regulator, which reflects the increase in the operating cost due to uncertainty about the dynamics parameters. However, available results in…

系统与控制 · 计算机科学 2020-03-24 Mohamad Kazem Shirani Faradonbeh , Ambuj Tewari , George Michailidis

First-Order Bayesian Regret Analysis of Thompson Sampling

We address online combinatorial optimization when the player has a prior over the adversary's sequence of losses. In this framework, Russo and Van Roy proposed an information-theoretic analysis of Thompson Sampling based on the information…

机器学习 · 计算机科学 2022-04-05 Sébastien Bubeck , Mark Sellke

TSEB: More Efficient Thompson Sampling for Policy Learning

In model-based solution approaches to the problem of learning in an unknown environment, exploring to learn the model parameters takes a toll on the regret. The optimal performance with respect to regret or PAC bounds is achievable, if the…

机器学习 · 计算机科学 2015-10-13 P. Prasanna , Sarath Chandar , Balaraman Ravindran

Optimizing Adaptive Experiments: A Unified Approach to Regret Minimization and Best-Arm Identification

Practitioners conducting adaptive experiments often encounter two competing priorities: maximizing total welfare (or `reward') through effective treatment assignment and swiftly concluding experiments to implement population-wide…

机器学习 · 计算机科学 2024-07-31 Chao Qin , Daniel Russo

Multi-Armed Bandits with Network Interference

Online experimentation with interference is a common challenge in modern applications such as e-commerce and adaptive clinical trials in medicine. For example, in online marketplaces, the revenue of a good depends on discounts applied to…

机器学习 · 计算机科学 2024-05-30 Abhineet Agarwal , Anish Agarwal , Lorenzo Masoero , Justin Whitehouse

Asymptotic Convergence of Thompson Sampling

Thompson sampling has been shown to be an effective policy across a variety of online learning tasks. Many works have analyzed the finite time performance of Thompson sampling, and proved that it achieves a sub-linear regret under a broad…

机器学习 · 计算机科学 2020-11-10 Cem Kalkanli , Ayfer Ozgur

Active Online Learning with Hidden Shifting Domains

Online machine learning systems need to adapt to domain shifts. Meanwhile, acquiring label at every timestep is expensive. We propose a surprisingly simple algorithm that adaptively balances its regret and its number of label queries in…

机器学习 · 计算机科学 2021-03-01 Yining Chen , Haipeng Luo , Tengyu Ma , Chicheng Zhang

Estimating Total Treatment Effect in Randomized Experiments with Unknown Network Structure

Randomized experiments are widely used to estimate the causal effects of a proposed treatment in many areas of science, from medicine and healthcare to the physical and biological sciences, from the social sciences to engineering, to public…

统计方法学 · 统计学 2022-11-30 Christina Lee Yu , Edoardo M Airoldi , Christian Borgs , Jennifer T Chayes

Power Constrained Nonstationary Bandits with Habituation and Recovery Dynamics

A common challenge for decision makers is selecting actions whose rewards are unknown and evolve over time based on prior policies. For instance, repeated use may reduce an action's effectiveness (habituation), while inactivity may restore…

机器学习 · 计算机科学 2025-11-06 Fengxu Li , Stephanie M. Carpenter , Matthew P. Buman , Yonatan Mintz

Estimation and inference of average treatment effects under heterogeneous additive treatment effect model

Randomized experiments are the gold standard for estimating treatment effects, yet network interference challenges the validity of traditional estimators by violating the stable unit treatment value assumption and introducing bias. While…

统计方法学 · 统计学 2024-09-02 Xin Lu , Hongzi Li , Hanzhong Liu