统计方法学
Confounder selection, namely choosing a set of covariates to control for confounding between a treatment and an outcome, is arguably the most important step in the design of an observational study. Previous methods, such as Pearl's…
Principal component analysis (PCA), the most popular dimension-reduction technique, has been used to analyze high-dimensional data in many areas. It discovers the homogeneity within the data and creates a reduced feature space to capture as…
This paper proposes a new AR-sieve bootstrap approach to high-dimensional time series. The major challenge of classical bootstrap methods on high-dimensional time series is two-fold: curse of dimensionality and temporal dependence. To…
A challenge for practitioners of Bayesian inference is specifying a model that incorporates multiple relevant, heterogeneous data sets. It may be easier to instead specify distinct submodels for each source of data, then join the submodels…
This paper is concerned with the construction of prior free posterior distributions which rely on the use of one step ahead predictive distribution functions. These are typically more straightforward to motivate than prior distributions.…
Bayesian inference is often implemented using approximations, which can yield interval estimates that are too narrow, not fully capturing the uncertainty in the posterior distribution. We address the question of how to adjust these…
Designing clinical trials requires evaluating multiple operating characteristics (OCs), such as the likelihood of an early stopping decision, the probability of detecting a treatment effect, and the Type I error rate. In most cases, these…
Recently, the U.S. Food and Drug Administration (FDA) released draft guidance \citep{FDA2026} signaling a paradigm shift that facilitates the use of Bayesian methodology as the primary analysis and decision framework for drug approval. The…
Gaussian processes provide a flexible framework for spatial prediction, but their computational cost limits applicability to large-scale data with large sample size $n$. Predictive processes (PPs), a popular low-rank approximation, mitigate…
Five-year cancer survival rates are widely reported and often interpreted to mean that early detection saves lives, that a late fatal diagnosis would have been prevented by earlier detection, and that increasing survival over time proves…
Traditional Functional Principal Component Analysis typically focuses on densely observed univariate functional data, yet many applications, particularly in longitudinal studies, involve multivariate functional data observed sparsely and…
For many years it was routine to use equal model prior probabilities in Bayesian model uncertainty analysis. At least twenty years ago it became clear that this was problematic, leading to support of much too large models in the…
We develop horizon-aware anytime-valid tests and confidence sequences for bounded means under a strict deadline $N$. Using the betting/e-process framework, we cast horizon-aware betting as a finite-horizon optimal control problem with state…
Multiple randomization designs (MRDs) are a class of experimental designs used to handle interference in two-sided marketplaces. We investigate regression adjustment strategies for estimating total, spillover, and direct effects in MRDs. We…
Least Absolute Deviations (LAD) regression provides a robust alternative to ordinary least squares by minimizing the sum of absolute residuals. However, its widespread use has been limited by the computational cost of existing solvers,…
In this paper, we study transfer learning for high-dimensional factor-augmented sparse linear models, motivated by applications in economics and finance where strongly correlated predictors and latent factor structures pose major challenges…
We consider the problem of fully Bayesian posterior estimation and uncertainty quantification in undirected Gaussian graphical models via Markov chain Monte Carlo (MCMC) under recently-developed element-wise graphical priors, such as the…
Large observational datasets, including those derived from electronic health records, are a valuable resource for medical research but are often affected by missingness, measurement error, and misclassification. Two-phase sampling with…
Electronic health records (EHR) are widely used to study clinical decisions, yet unmeasured confounding remains a persistent challenge. Proxy variables offer a potential solution. In EHR data, clinicians already record many such…
This study focuses on statistical inference for the class of quasi-infinitely divisible (QID) distributions, which was recently introduced by Lindner, Pan and Sato (2018). The paper presents a Fourier approach, based on the analogue of the…