统计方法学
A cornerstone of the multiple testing literature is the Benjamini-Hochberg (BH) procedure, which guarantees control of the FDR when $p$-values are independent or positively dependent. While BH controls the average quality of rejections, it…
The infant microbiome undergoes rapid changes in composition over time and is associated with long-term risks of conditions such as immune strength, allergy, asthma, and other health outcomes. Modeling the associations between exposures or…
Suppose that the normal model is used for data $Y_1,\ldots,Y_n$, but that the true distribution is a t-distribution with location and scale parameters $\xi$ and $\sigma$ and $m$ degrees of freedom. The normal model corresponds to…
In attempt to advance the current practice for assessing and predicting the primary ovarian insufficiency (POI) risk in female childhood cancer survivors, we propose two estimating function based approaches for age-specific logistic…
The analysis of high-dimensional data, common in fields such as genomics, is complicated by the presence of cellwise contamination, where individual cells rather than entire rows are corrupted. This contamination poses a significant…
Local regression is widely used to explore spatial heterogeneity, but anisotropic or effectively low-dimensional neighborhoods can produce ill-conditioned local solves, causing coefficient variation driven by numerical artifacts rather than…
Missing data is an universal problem in statistics. We develop a unified framework for estimating parameters defined by general estimating equations under a missing-at-random (MAR) mechanism, based on generalized entropy calibration…
Spectral clustering is a popular tool in network data analysis, with applications in a variety of scientific application areas. However, many studies have shown that classical spectral clustering does not perform well on certain network…
Treatment effects of stochastic policy shifts quantify differences in outcomes across counterfactual scenarios with varying treatment distributions. Stochastic policy shifts may be of interest in settings where it is unrealistic or…
We propose a new omnibus goodness-of-fit test based on trigonometric moments of probability-integral-transformed data. The test builds on the framework of the LK test introduced by Langholz and Kronmal [J. Amer. Statist. Assoc. 86 (1991),…
While average treatment effects (ATE) and conditional average treatment effects (CATE) provide valuable population- and subgroup-level summaries, they fail to capture uncertainty at the individual level. For high-stakes decision-making,…
Aggregated relational data is widely collected to study social networks, in fields such as sociology, public health and economics. Many of the successes of ARD inference have been driven by increasingly complex Bayesian models, which…
Longitudinal binary or count functional data are common in neuroscience, but are often too large to analyze with existing functional regression methods. We propose one-step penalized generalized estimating equations that supports…
The long term consequences of unwanted pregnancies carried to term on mothers have not been much explored. We use data from the Wisconsin Longitudinal Study (WLS) and propose a novel approach, namely two team cross-screening, to study the…
Bayesian Poisson Non-Negative Matrix Factorization (NMF) is widely used to model count data, including in cancer mutational signature analysis. However, standard Gibbs samplers rely on computationally expensive Poisson augmentation, and…
Analysis of data from randomized controlled trials in vulnerable populations requires special attention when assessing treatment effect by a score measuring, e.g., disease stage or activity together with onset of prevalent terminal events.…
We provide novel probabilistic portrayals of two multivariate models designed to handle zero-inflation in count-compositional data. We develop a new unifying framework that represents both as finite mixture distributions. One of these…
There is interest in learning about the causal effects of modern contraceptive use on empowerment outcomes. Data on this question often come from family planning (FP) programs that increase access to FP and facilitate contraceptive use…
The Oja depth (simplicial volume depth) is one of the classical statistical techniques for measuring the central tendency of data in multivariate space. Despite the widespread emergence of object data like images, texts, matrices or graphs,…
Gaussian graphical models (GGMs) are widely used to recover the conditional independence structure among random variables. Recent work has sought to incorporate auxiliary covariates to improve estimation, particularly in applications such…