统计方法学
Learning distributions of longitudinal data is central to tasks such as visualization, completion, classification, and synthetic data generation, but it remains statistically challenging because longitudinal observations are often…
Principal stratification provides a foundational framework for causal inference with intermediate outcomes by defining causal effects within subpopulations, yet existing work has largely focused on average effects across strata rather than…
Standard statistical methods are often inadequate for modeling the joint dependence between linear and circular variables, and existing methods for modeling this dependence are designed only for continuous variables. However, circular data…
We introduce a novel goodness-of-fit (GOF) procedure based on Beta-tree partitions. A Beta-tree produces a data-adaptive partition of the sample space into regions and provides guaranteed finite sample confidence intervals for the…
Time-varying treatment effects, surrogate-identified treatment effects, and mediation effects can all be written as recursive regressions, in which each regression's predicted values become generated outcomes for the next regression. We…
A critical assumption of observational studies is that all confounding variables must be known and sufficiently adjusted for to estimate causal effects. An implicit, and often overlooked, aspect of this assumption is that all confounding…
This paper presents a methodological framework for estimating the comprehensive cohort causal effect (CCCE) in mixed-design clinical studies that combine randomized controlled trials (RCTs) and parallel observational study (OBS). Our…
[Working Draft] Compositional data are central to microbial, ecological, and environmental research, yet often have four features that are difficult to accommodate jointly: exact zeros, latent dependence among components,…
We present a justification of the use of Inverse Probability Weighting (IPW) in a post-Bayesian framework, in which the bias-correction provided by IPW in a frequentist context is reframed as a reweighting of the Kullback-Leibler (KL)…
Prediction-powered inference (PPI) refers to a two-level situation where the statistician observes a set of $(x,y)$ pairs and another set of $x$s with the responses $y$ missing. Also available is some independent background data from which…
Prediction sets should have high coverage to be useful, but some coverage notions are more practically relevant than others. In the classification setting, class-conditional coverage requires that the prediction set (i.e., the set of…
Median bias reduction of maximum likelihood estimators can substantially improve estimation and inference. Existing generally applicable methods are, however, typically implicit, requiring the solution of nonlinear systems of estimating…
One of the two dominant approaches for univariate extreme value analysis is to model exceedances above a large threshold, the choice of which has a large impact on inference and whose uncertainty is often subsequently ignored. In this…
N-of-1 trials, or time-series experiments, are widely used in clinical research and online platforms. Yet the theoretically optimal design for estimating many treatment effects remains unclear. We propose a simple Markovian framework for…
Meta-analyses of the accuracy of two diagnostic tests typically assume tests are independent conditional on true disease status. This assumption is often unrealistic and violation leads to biased estimates of the accuracy of tests used in…
Quantifying efficacy uncertainty across the entire dose range is crucial in dose-response studies. Although the frequentist simultaneous confidence band (FSCB) is widely used for this purpose, it does not readily incorporate prior…
Background: One of the suggested models for meta-analysis with rare events is the beta-binomial model (BBM). The main advantage of this model compared to inverse-variance models, is that it uses information from zero cells without needing a…
We derive augmented inverse probability weighted estimators for occupation probabilities of multistate models under two levels of coarsening; right-censoring and baseline exposure. The key exchangeability assumption for identification is…
In modern applications of linear mixed models, the number of candidate fixed-effects covariates can grow exponentially with the sample size, while dependence induced by random effects and possible data contamination pose substantial…
Switchback experiments and other clustered randomized designs are widely used on online platforms, but the clustered, time-dependent nature of these designs can make standard variance reduction methods behave differently than in standard…