统计方法学
In observational causal inference, domain knowledge often leaves multiple covariate adjustments plausible, yet which sets satisfy ignorability is untestable. Different adjustment sets can yield conflicting estimates of the average treatment…
We develop a unified framework for automatic debiased machine learning (autoDML) for inference on a broad class of statistical parameters. The framework applies to any smooth functional of a nonparametric M-estimand, defined as the…
Post-clustering inference in single-cell RNA sequencing (scRNA-seq) analysis presents significant challenges in controlling Type I error during differential expression analysis. Data fission, a promising approach that aims to split data…
We consider regression in which one predicts a response $Y$ with a set of predictors $X$ across different experiments or environments. This is a common setup in many data-driven scientific fields and we argue that statistical inference can…
To estimate the causal effect of an intervention, researchers need to identify a control group that represents what might have happened to the treatment group in the absence of that intervention. This is challenging without a randomized…
Prediction-powered inference (PPI) is a rapidly growing framework for combining machine learning predictions with a small set of gold-standard labels to conduct valid statistical inference. In this article, I argue that the core estimators…
Longitudinal cluster randomized trials (L-CRTs) are increasingly used to evaluate the cost-effectiveness of healthcare interventions across multiple assessment periods, yet design methods for powering these trials remain underdeveloped.…
Distributed lag non-linear models (DLNMs) are a popular approach to flexibly model the effect of time-delayed exposures. Classical DLNMs specify a common exposure-lag-response relationship across geographical areas. However, this…
Sparse functional data arise when measurements are observed infrequently and at irregular time points for each subject, often in the presence of measurement error. These characteristics introduce additional challenges for functional…
Biclustering is a powerful unsupervised learning technique for simultaneously identifying coherent subsets of rows and columns in a data matrix, thus revealing local patterns that may not be apparent in global analyses. However, most…
The use of synthetic data to deidentify data and to improve predictive models is well-attested to. The augmentation of datasets using synthetically generated data is an alluring proposition: in the best case, it generates realistic data…
In Structural Health Monitoring (SHM), sensor measurements and derived features such as eigenfrequencies often exhibit systematic daily patterns and can therefore be naturally represented as functional data. Furthermore, these patterns are…
Motivated by the EVA 2025 Data Challenge, we address the problem of predicting extreme rainfall in the eastern United States using data from a large ensemble of climate model runs. The challenge focuses on three quantities of interest…
We study transfer learning for contextual joint assortment-pricing under a multinomial logit choice model with bandit feedback. A seller operates across multiple related markets and observes only posted prices and realized purchases. While…
This paper investigates testing for deviation of a high-dimensional mean vector $\boldsymbol{\mu}$. In contrast to the standard one-sample significance test of the form: $H_0^\texttt{e} : \boldsymbol{\mu} = \boldsymbol{\mu}_0$ versus…
Designing efficient experiments under practical constraints is critical in both scientific research and industrial practice. Focusing on minimizing the average variance of the parameter estimates, A-optimal designs show advantages in…
Forward regression is a classical and effective tool for variable screening in ultra-high dimensional linear models, but its standard projection-based implementation can be computationally costly and numerically unstable when predictors are…
Record-breaking temperature events are now frequently in the news, proffered as evidence of climate change, and often bring significant economic and human impacts. Our previous work undertook the first substantial spatial modelling…
We propose a novel modeling framework for time-evolving networks allowing for long-term dependence in network features that update in continuous time. Dynamic network growth is functionally parameterized via the conditional intensity of a…
We propose a grid-based methodology for online changepoint detection that allows offline changepoint tests to be applied to sequentially observed data. The methodology achieves low update and storage costs by testing for changepoints over a…