统计学
Rubin multiple imputation (MI) generates plausible data completions to account for uncertainty and statistical variability but provides little insight into their global organization. We introduce a topological reconstruction approach that…
Modern multivariate regression problems involve several related outcomes whose regression effects are not only nonlinear, heterogeneous, and outcome-specific, but also where the residual dependence among outcomes is scientifically…
Learning distributions of longitudinal data is central to tasks such as visualization, completion, classification, and synthetic data generation, but it remains statistically challenging because longitudinal observations are often…
This paper introduces the R package spca, which provides a computational framework for least squares sparse principal component analysis (LS-SPCA). Unlike other SPCA methods, LS-SPCA generates uncorrelated sparse principal components (sPCs)…
A growing number of techniques leverage the spatial structures that underlie many real-world datasets. Despite these advances, the complementary task of estimating spatial structures and understanding their role within these techniques has…
Principal stratification provides a foundational framework for causal inference with intermediate outcomes by defining causal effects within subpopulations, yet existing work has largely focused on average effects across strata rather than…
Standard statistical methods are often inadequate for modeling the joint dependence between linear and circular variables, and existing methods for modeling this dependence are designed only for continuous variables. However, circular data…
We introduce a novel goodness-of-fit (GOF) procedure based on Beta-tree partitions. A Beta-tree produces a data-adaptive partition of the sample space into regions and provides guaranteed finite sample confidence intervals for the…
Time-varying treatment effects, surrogate-identified treatment effects, and mediation effects can all be written as recursive regressions, in which each regression's predicted values become generated outcomes for the next regression. We…
Predicting the aerodynamic performance (e.g. lift, drag, and moment coefficients) of an aircraft is challenging -- computational models are biased and direct simulations are prohibitive. A pragmatic way to overcome this limitation is by…
The common factor analytic model is related to Helmholtz and Boltzmann machines, can be conceived as a linear autoencoder, or can be thought of as a single-hidden-layer generative neural network. We thus consider it a basal generative…
Biomedical research is increasingly relying on readily available routine data, such as electronic health records. Routinely collected data, as well as datasets from large cohorts, are often prone to measurement error which, if not addressed…
We study the leading-order fluctuation of stochastic gradient Euler-Maruyama estimators for generalized non-reversible Langevin dynamics. Under structural assumptions tailored to the small-stepsize central limit theorem and under an…
A critical assumption of observational studies is that all confounding variables must be known and sufficiently adjusted for to estimate causal effects. An implicit, and often overlooked, aspect of this assumption is that all confounding…
This paper presents a methodological framework for estimating the comprehensive cohort causal effect (CCCE) in mixed-design clinical studies that combine randomized controlled trials (RCTs) and parallel observational study (OBS). Our…
[Working Draft] Compositional data are central to microbial, ecological, and environmental research, yet often have four features that are difficult to accommodate jointly: exact zeros, latent dependence among components,…
We present a justification of the use of Inverse Probability Weighting (IPW) in a post-Bayesian framework, in which the bias-correction provided by IPW in a frequentist context is reframed as a reweighting of the Kullback-Leibler (KL)…
Online high-dimensional regression requires algorithms that can update sequentially while preserving structural sparsity. We propose \textit{Adaptive Iterative Hard Thresholding (AIHT)}, an online sparse-regression framework that alternates…
Prediction-powered inference (PPI) refers to a two-level situation where the statistician observes a set of $(x,y)$ pairs and another set of $x$s with the responses $y$ missing. Also available is some independent background data from which…
Prediction sets should have high coverage to be useful, but some coverage notions are more practically relevant than others. In the classification setting, class-conditional coverage requires that the prediction set (i.e., the set of…