Statistics
The analysis of platform trials can be enhanced by utilizing non-concurrent controls. Since including this data might also introduce bias in the treatment effect estimators if time trends are present, methods for incorporating…
The solar spectral irradiance (SSI) depicts the spectral distribution of solar energy flux reaching the top of the Earth's atmosphere. Daily SSI measurements constitute a matrix with spectrally (rows) and temporally (columns) resolved solar…
Variational inference, as an alternative to Markov chain Monte Carlo sampling, has played a transformative role in enabling scalable computation for complex Bayesian models. Nevertheless, existing approaches often depend on either rigid…
This Element offers a practical guide to estimating conditional marginal effects-how treatment effects vary with a moderating variable-using modern statistical methods. Commonly used approaches, such as linear interaction models, often…
The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making, where a decision-maker selects an arm in each round with the goal of maximizing long-term rewards. In many practical applications, such as…
Large language models (LLMs) are increasingly used to simulate survey responses, but synthetic data can be misaligned with the human population, leading to unreliable inference. We develop a general framework that converts LLM-simulated…
Traditionally, studies in experimental physiology have been conducted in small groups of human participants, animal models or cell lines. Identifying optimal study designs that achieve sufficient power for drawing proper statistical…
In graph-based data analysis, $k$-nearest neighbor ($k$NN) graphs are widely used due to their adaptivity to local data densities. Allowing weighted edges in the graph, the kernelized graph affinity provides a more general type of $k$NN…
Recent years have seen a surge in methods for two-sample testing, among which the Maximum Mean Discrepancy (MMD) test has emerged as an effective tool for handling complex and high-dimensional data. Despite its success and widespread…
Invariant causal prediction provides a useful framework for identifying causal predictors of a response using heterogeneous data from multiple environments. One valuable property of the original invariant causal prediction method is that it…
Bayesian optimization (BO) selects evaluation points for expensive black-box objectives using Gaussian process (GP) predictive distributions. Kernel choice and hyperparameter selection can lead to miscalibrated predictive distributions and…
Risk management is an important part of financial practice, essential for protecting assets and investments in modern-day volatile markets. This paper proposes a mixture of mirrored Weibull (MMW) distribution for modelling stock returns and…
In clinical studies, persistence, which measures the duration of time a patient continues to take a prescribed medication without discontinuation, is increasingly recognized as a critical indicator of adherence to medication. Adherence…
Privacy constraints have driven the rise of federated learning (FL), which enables multi-site analyses without sharing individual participant data. We develop a framework for FL with missing data, identifying conditions under which the…
Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately,…
Standard generative models struggle with heavy-tailed data: Lipschitz architectures cannot produce power-law tails from Gaussian noise, and interpolating between heavy-tailed data and Gaussians is ill-posed. We propose a simple fix: apply…
Existing identification results in proximal causal inference often focus on marginal interventional distributions using standard outcome or treatment bridge functions. These methods do not generally identify joint interventional…
In this work, we study the estimation of treatment duration effects in observational survival data, where treatment and covariate histories evolve over time and longer observed durations are only attainable among individuals who remain…
Tolerance limits have received considerable attention in the statistical literature, with applications reaching far beyond their initial role in quality control. The well-known formula of Scheff\'e and Tukey (1944) establishes a simple,…
Stationary subspace analysis (SSA) is a blind source separation framework that decomposes linearly mixed multivariate data into stationary and nonstationary components. We extend SSA to spatially indexed data by introducing spatial…