统计方法学
We study empirical Bayes (EB) predictive density estimation in linear mixed models (LMMs) with large number of units, which induce a high dimensional random effects space. Focusing on Kullback Leibler (KL) risk minimization, we develop a…
Determining the number of change-points is a first-step and fundamental task in change-point detection problems, as it lays the groundwork for subsequent change-point position estimation. While the existing literature offers various methods…
This study develops two robust, quantile-sliced moment systems, mean and median absolute deviation (MAD and MedAD moments), to serve as foundational tools in parametric modeling, statistical inference, and describing distributional…
Semi-supervised learning has attracted significant attention due to the proliferation of applications featuring limited labeled data but abundant unlabeled data. In this paper, we examine the statistical inference problem in an…
We review recently proposed Bayesian approaches for clustering high-dimensional data. After identifying the main limitations of available approaches, we introduce an alternative framework based on vertical consensus inference (VCI) to…
Randomized controlled trials (RCTs) yield internally valid causal effect estimates, but generalizing these results to target populations with different characteristics requires an untestable selection ignorability assumption: conditional on…
Empirical likelihood is an attractive inferential framework that respects natural parameter boundaries, but existing approaches typically require smoothness of the functional and miscalibrate substantially when these assumptions are…
We propose a scalable, provably accurate method for localizing an unknown number of multiple axis-aligned anomalous patches in spatial data under a general class of spatial dependence. Motivated by the practical need to detect localized…
In many statistical settings, two types of data are available: coupled data, which preserve the joint structure among variables but are limited in size due to cost or privacy constraints, and marginal data, which are available at larger…
In this paper, we study properties of penalized and structured M-estimators of multivariate scatter, based on geodesically convex but not necessarily smooth penalty functions. Existence and uniqueness conditions for these penalized and…
Risk assessment of hurricane-driven storm surge relies on deterministic computer models that produce outputs over a large spatial domain. The surge models can often be run at a range of fidelity levels, with greater precision yielding more…
We develop an extreme value framework for CoVaR centered on $v(q \mid p ; C)$, the copula-adjusted probability level, or equivalently, the CoVaR on the uniform (0,1) scale. We characterize the possible tail regimes of $v(q \mid p ; C)$…
Retrospective causal questions ask what would have happened to an observed individual had they received a different treatment. We study the problem of estimating $\mu(x,y)=\mathbb{E}[Y(1)\mid X=x,Y(0)=y]$, the expected counterfactual…
Factor models are widely used for dimension reduction. Bayesian approaches to these models often place a prior on the factor loadings that allows for infinitely many factors, with loadings increasingly shrunk toward zero as the column index…
Many modern products are highly reliable, often exhibiting long lifetimes. As a result, conducting experiments under normal operating conditions can be prohibitively time-consuming to collect sufficient failure data for robust statistical…
In large-scale biomedical research, it's common to gather ultra-high dimensional data that includes right-censored survival times. Feature screening has emerged as a crucial statistical technique for handling such data. In this paper, we…
The measurement of human behavior remains a central challenge across the behavioral sciences. Traditional approaches typically rely on passive observation of responses collected under static or weakly controlled conditions, limiting the…
When performing Bayesian inference, we frequently need to work with conditional probability densities. For example, the posterior function is the conditional density of the parameters given the data. Some might worry that conditional…
When treatment policy estimands are of interest, clinical trials often attempt to collect patient data after intercurrent events (ICEs), although such data are often limited. Retrieved dropout imputation methods, which use pre-ICE and…
Understanding the interplay between high-dimensional data from different views is essential in biomedical research, particularly in fields such as genomics, neuroimaging and biobank-scale studies involving high-dimensional features.…