统计方法学
Longitudinal studies frequently incorporate covariates that evolve over time, creating complex dependence structures between outcomes and predictors. When covariates are time dependent, standard power analysis tools--largely developed for…
Pairwise human-preference platforms such as Chatbot Arena have become central to large language model (LLM) evaluation, yet reliable task-specific ranking remains challenging. Global leaderboards mask task heterogeneity, while ranking each…
This paper develops a framework for differentially private $e$-values under Gaussian differential privacy ($\mu$-GDP). We characterize the canonical noise mechanism, establishing that optimal multiplicative perturbation follows a Gaussian…
We consider time to treatment initiation. This can commonly occur in preventive medicine, such as disease screening and vaccination; it can also occur with non-fatal health conditions such as HIV infection without the onset of AIDS. While…
Exact Kriging and conditional simulation (CS) for uncertainty quantification are computationally infeasible for modern spatial analyses with large numbers of observations and dense prediction grids. We present a rapid approximation to the…
Regression is a fundamental tool in scientific research. Ordinary least squares (OLS), one of the most widely used regression methods, enjoys several desirable properties, including the best linear unbiased estimator (BLUE) property. It is…
Besides the classical motivation of fusing evidence from multiple sources, modern inferential procedures based on randomization, resampling, and data splitting often introduce analyst-generated multiplicity, where aggregating outputs across…
Conformal prediction is a framework for providing prediction intervals with distribution-free validity, guaranteeing predictive coverage for data drawn from any distribution. Its two main variants are full conformal prediction and split…
Response times collected in computerised assessments provide information about the underlying response process and may exhibit within-person variation over the course of a test. We propose a latent variable model for log response times that…
Generalized additive index models (GAIMs) offer a flexible semiparametric framework for capturing complex data relationships, balancing the interpretability of parametric models with the flexibility of nonparametric approaches. However,…
When surveillance data of infectious disease incidence (e.g. weekly case counts) are disaggregated by demographic indicators, disparities in long-run health outcomes between these groups become apparent. Accurate identification of high-risk…
We study causal inference for time-to-event outcomes under right censoring in the presence of unmeasured confounding. Focusing on structural accelerated failure time models, we develop an identification and inference framework that exploits…
Difference-in-differences (DiD) is a cornerstone of causal inference, yet extending it to functional outcomes is not a routine scalar generalization; rather, it entails three fundamental challenges in identification, inference, and…
In regression problems where covariates are naturally organized in a hierarchical tree structure, a central challenge is to select the resolution at which covariates enter the model. Determining this level of feature aggregation is of…
Michaelis--Menten analysis is often conducted by nonlinear least squares under a constant-variance assumption, even though enzyme-kinetic data frequently display concentration-dependent heteroscedasticity and often include repeated or…
We study the problem of estimating the effect function for a continuous treatment, which maps each treatment value to a population-averaged outcome. A central challenge in this setting is confounding: treatment assignment often depends on…
Randomized saturation designs are two-stage experiments: they first randomly assign treatment probabilities over the clusters and then randomly assign the treatment to the units within the clusters. The existing literature on randomized…
We present the winning strategy for the EVA2025 Data Challenge, which aimed to estimate the probability of extreme precipitation events. These events occurred at most once in the dataset making the challenge fundamentally one of…
Human migration exhibits complex spatiotemporal dependence driven by environmental and socioeconomic forces. Modeling such patterns at scale requires methods that accommodate many random effects while remaining feasible when raw data or…
Bayesian inference provides principled uncertainty quantification but is often limited by the challenges of prior and likelihood elicitation. The martingale posterior (MGP) (Fong et al., 2023) offers an alternative by replacing these…