Related papers: Adaptive Policy Learning Under Unknown Network Int…
Many interventions, such as vaccines in clinical trials or coupons in online marketplaces, must be assigned sequentially without full knowledge of their effects. Multi-armed bandit algorithms have proven successful in such settings.…
This paper studies adaptive targeting under network interference in a bandit setting, where treatments applied to one individual may affect others through spillover effects. We consider a linear model in a sparse regime, where each…
This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multi-armed bandit problems. The algorithm, also known as Thompson Sampling,…
We consider the problem of controlling an unknown linear quadratic Gaussian (LQG) system consisting of multiple subsystems connected over a network. Our goal is to minimize and quantify the regret (i.e. loss in performance) of our strategy…
This paper studies the problem of optimally allocating treatments in the presence of spillover effects, using information from a (quasi-)experiment. I introduce a method that maximizes the sample analog of average social welfare when…
Causal effect estimation in networked systems is central to data-driven decision making. In such settings, interventions on one unit can spill over to others, and in complex physical or social systems, the interaction pathways driving these…
We consider settings where an allocation has to be chosen repeatedly, returns are unknown but can be learned, and decisions are subject to constraints. Our model covers two-sided and one-sided matching, even with complex constraints. We…
Network interference has attracted significant attention in the field of causal inference, encapsulating various sociological behaviors where the treatment assigned to one individual within a network may affect the outcomes of others, such…
Restless bandit problems assume time-varying reward distributions of the arms, which adds flexibility to the model but makes the analysis more challenging. We study learning algorithms over the unknown reward distributions and prove a…
Although there is now a large literature on policy evaluation and learning, much of the prior work assumes that the treatment assignment of one unit does not affect the outcome of another unit. Unfortunately, ignoring interference can lead…
Performance of adaptive control policies is assessed through the regret with respect to the optimal regulator, which reflects the increase in the operating cost due to uncertainty about the dynamics parameters. However, available results in…
We address online combinatorial optimization when the player has a prior over the adversary's sequence of losses. In this framework, Russo and Van Roy proposed an information-theoretic analysis of Thompson Sampling based on the information…
In model-based solution approaches to the problem of learning in an unknown environment, exploring to learn the model parameters takes a toll on the regret. The optimal performance with respect to regret or PAC bounds is achievable, if the…
Practitioners conducting adaptive experiments often encounter two competing priorities: maximizing total welfare (or `reward') through effective treatment assignment and swiftly concluding experiments to implement population-wide…
Online experimentation with interference is a common challenge in modern applications such as e-commerce and adaptive clinical trials in medicine. For example, in online marketplaces, the revenue of a good depends on discounts applied to…
Thompson sampling has been shown to be an effective policy across a variety of online learning tasks. Many works have analyzed the finite time performance of Thompson sampling, and proved that it achieves a sub-linear regret under a broad…
Online machine learning systems need to adapt to domain shifts. Meanwhile, acquiring label at every timestep is expensive. We propose a surprisingly simple algorithm that adaptively balances its regret and its number of label queries in…
Randomized experiments are widely used to estimate the causal effects of a proposed treatment in many areas of science, from medicine and healthcare to the physical and biological sciences, from the social sciences to engineering, to public…
A common challenge for decision makers is selecting actions whose rewards are unknown and evolve over time based on prior policies. For instance, repeated use may reduce an action's effectiveness (habituation), while inactivity may restore…
Randomized experiments are the gold standard for estimating treatment effects, yet network interference challenges the validity of traditional estimators by violating the stable unit treatment value assumption and introducing bias. While…