Related papers: YEAST: Yet Another Sequential Test
Experimental testing is vital in the optimization of web applications, and as such A/B testing has been widely adopted as a methodology for determining optimal content for many web applications. While some testing platforms provide…
AB testing aids business operators with their decision making, and is considered the gold standard method for learning from data to improve digital user experiences. However, there is usually a gap between the requirements of practitioners,…
Tech companies (e.g., Google or Facebook) often use randomized online experiments and/or A/B testing primarily based on the average treatment effects to compare their new product with an old one. However, it is also critically important to…
We propose a new framework for online testing of heterogeneous treatment effects. The proposed test, named sequential score test (SST), is able to control type I error under continuous monitoring and detect multi-dimensional heterogeneous…
Experimental design has emerged as a powerful approach for improving the sample efficiency of A/B testing, yet existing designs rely critically on correctly specified models. We study robust sequential experimental design under model…
A/B testing is one of the most successful applications of statistical theory in modern Internet age. One problem of Null Hypothesis Statistical Testing (NHST), the backbone of A/B testing methodology, is that experimenters are not allowed…
Online experiments in internet systems, also known as A/B tests, are used for a wide range of system tuning problems, such as optimizing recommender system ranking policies and learning adaptive streaming controllers. Decision-makers…
We consider a quantum system that is being continuously monitored, giving rise to a measurement signal. From such a stream of data, information needs to be inferred about the underlying system's dynamics. Here we focus on hypothesis testing…
When interpreting A/B tests, we typically focus only on the statistically significant results and take them by face value. This practice, termed post-selection inference in the statistical literature, may negatively affect both point…
A/B tests are the gold standard for evaluating digital experiences on the web. However, traditional "fixed-horizon" statistical methods are often incompatible with the needs of modern industry practitioners as they do not permit continuous…
With the growing needs of online A/B testing to support the innovation in industry, the opportunity cost of running an experiment becomes non-negligible. Therefore, there is an increasing demand for an efficient continuous monitoring…
Sequential decision making significantly speeds up research and is more cost-effective compared to fixed-n methods. We present a method for sequential decision making for stratified count data that retains Type-I error guarantee or false…
The widespread adoption of online randomized controlled experiments (A/B Tests) for decision-making has created ongoing capacity constraints which necessitate interim analyses. As a consequence, platform users are increasingly motivated to…
This review aims to provide a comprehensive update on the progress made on the Sequential Testing problem (STP) in the last 20 years after the review, [1] was published. Many studies have provided new theoretical results, extensions of the…
This work investigates the sequential hypothesis testing problem with online sensor selection and sensor usage constraints. That is, in a sensor network, the fusion center sequentially acquires samples by selecting one "most informative"…
A/B tests are typically analyzed via frequentist p-values and confidence intervals; but these inferences are wholly unreliable if users endogenously choose samples sizes by *continuously monitoring* their tests. We define *always valid*…
Linear models are foundational tools in statistics and ubiquitous across the applied sciences. However, conventional statistical inference -- such as $t$-tests and $F$-tests -- are only valid at fixed sample sizes, making them unsuitable…
The rollout of new versions of a feature in modern applications is a manual multi-stage process, as the feature is released to ever larger groups of users, while its performance is carefully monitored. This kind of A/B testing is…
Randomized experiments ensure robust causal inference that are critical to effective learning analytics research and practice. However, traditional randomized experiments, like A/B tests, are limiting in large scale digital learning…
The growing need for synthetic time series, due to data augmentation or privacy regulations, has led to numerous generative models, frameworks, and evaluation measures alike. Objectively comparing these measures on a large scale remains an…