统计方法学
Principal component analysis (PCA) is a fundamental tool in multivariate statistics, yet its sensitivity to outliers and limitations in distributed environments restrict its effectiveness in modern large-scale applications. To address these…
In the era of big data, integrating multi-source functional data to extract a subspace that captures the shared subspace across sources has attracted considerable attention. In practice, data collection procedures often follow…
Neal's funnel refers to an exponential tapering in probability densities common to Bayesian hierarchical models. Usual sampling methods, such as Markov Chain Monte Carlo, struggle to efficiently sample the funnel. Reparameterizing the model…
Most causal inference methods focus on estimating marginal average treatment effects, but many important causal estimands depend on the joint distribution of potential outcomes, including the probability of causation and proportions…
Granger causality is popular for analyzing time series data in many applications from natural science to social science including genomics, neuroscience, economics, and finance. Consequently, the Granger causality test has become one of the…
RDD (Regression discontinuity design) is a widely used framework for identifying and estimating causal effects at the cutoff of a single running variable. In practice, however, decision-making often involves multiple thresholds and…
The shapes of functions provide highly interpretable summaries of their trajectories. This article develops a novel transfer learning methodology to tackle the challenge of data scarcity in functional linear models. The methodology…
The I-SPY2 phase 2 clinical trial is a long-running platform trial that evaluates neoadjuvant treatments for locally advanced breast cancer, assigning subjects to novel agents using response-adaptive randomization. Recently, I-SPY2 was…
Accounting for dependence among high-dimensional variables in omics data analysis is critical to obtain accurate and reliable statistical inference. Although latent, omics variables often exhibit structured correlation/co-expression…
Motivated by the study of state opioid policies, we propose a novel approach that uses autoregressive models for causal effect estimation in settings with panel data and staggered treatment adoption. Specifically, we seek to estimate the…
AB testing evaluates the difference between a control and a treatment in a statistically rigorous manner. Continuous monitoring allows statistical evaluation of an AB test as it proceeds. One goal of continuous monitoring is early stopping…
A critical literature review and comprehensive simulation study is used to show that (a) non-parametric bootstrap is a viable alternative to commonly taught and used methods in basic estimation tasks (mean, variance, quartiles, correlation)…
This paper develops a statistical framework for goodness-of-fit testing of volatility functions in McKean-Vlasov stochastic differential equations, which describe large systems of interacting particles with distribution-dependent dynamics.…
We propose a novel framework for reconstructing the chronology of genetic regulation using causal inference based on Pearl's theory. The approach proceeds in three main stages: causal discovery, causal inference, and chronology…
Nonlinear and delayed effects of covariates often render time series forecasting challenging. To this end, we propose a novel forecasting framework based on ridge regression with signature features calculated on sliding windows. These…
In precision medicine, one of the most important problems is estimating the optimal individualized treatment rules (ITR), which typically involves recommending treatment decisions based on fully observed individual characteristics of…
Functional logistic regression is a popular model to capture a linear relationship between binary response and functional predictor variables. However, many methods used for parameter estimation in functional logistic regression are…
Reconstructing evolutionary histories and estimating the rate of evolution from molecular sequence data is of central importance in evolutionary biology and infectious disease research. We introduce a flexible Bayesian phylogenetic…
Contrastive dimension reduction (CDR) methods aim to extract signal unique to or enriched in a treatment (foreground) group relative to a control (background) group. This setting arises in many scientific domains, such as genomics, imaging,…
We present a new method for the statistical process control of lattice structures using tools from Topological Data Analysis. Motivated by applications in additive manufacturing, such as aerospace components and biomedical implants, where…