统计理论
COVID-19 has had a large scale negative impact on the health of opioid users exacerbating the health of an already vulnerable population. Critical information on the total impact of COVID-19 on opioid users is unknown due to a lack of…
Calibration refers to the statistical estimation of unknown model parameters in computer experiments, such that computer experiments can match underlying physical systems. This work develops a new calibration method for imperfect computer…
Hybrid Gibbs samplers represent a prominent class of approximated Gibbs algorithms that utilize Markov chains to approximate conditional distributions, with the Metropolis-within-Gibbs algorithm standing out as a well-known example. Despite…
Estimating the state of a dynamical system from partial and noisy observations is a ubiquitous problem in a large number of applications, such as probabilistic weather forecasting and prediction of epidemics. Particle filters are a widely…
Given an arbitrary subgraph $H=H_n$ and $p=p_n \in (0,1)$, the planted subgraph model is defined as follows. A statistician observes the union a random copy $H^*$ of $H$, together with random noise in the form of an instance of an…
Undirected graphical models are a widely used class of probabilistic models in machine learning that capture prior knowledge or putative pairwise interactions between variables. Those interactions are encoded in a graph for pairwise…
We compute asymptotic non-linear shrinkage formulas for covariance and precision matrix estimators for weighted sample covariances, and the joint sample-population eigenvector overlap distribution, in the spirit of Ledoit and P\'ech\'e. We…
The problem of combining p-values is an old and fundamental one, and the classic assumption of independence is often violated or unverifiable in many applications. There are many well-known rules that can combine a set of arbitrarily…
Regularized generalized canonical correlation analysis (RGCCA) is a generalization of regularized canonical correlation analysis to three or more sets of variables, which is a component-based approach aiming to study the relationships…
Simultaneous statistical inference has been a cornerstone in the statistics methodology literature because of its fundamental theory and paramount applications. The mainstream multiple testing literature has traditionally considered two…
In subgroup analysis, testing the existence of a subgroup with a differential treatment effect serves as protection against spurious subgroup discovery. Despite its importance, this hypothesis testing possesses a complicated nature:…
SCoTLASS is the first sparse principal component analysis (SPCA) model which imposes extra l1 norm constraints on the measured variables to obtain sparse loadings. Due to the the difficulty of finding projections on the intersection of an…
We consider estimating the proportion of random variables for two types of composite null hypotheses: (i) the means of the random variables belonging to a non-empty, bounded interval; (ii) the means of the random variables belonging to an…
This work extends local linear regression to Banach space-valued time series for estimating smoothly varying means and their derivatives in non-stationary data. The asymptotic properties of both the standard and bias-reduced Jackknife…
We view penalized risks through the lens of the calculus of variations. We consider risks comprised of a fitness-term (e.g. MSE) and a gradient-based penalty. After establishing the Euler-Lagrange field equations as a systematic approach to…
In reliability theory and survival analysis, observed data are often weakly dependent and subject to additive measurement errors. Such contamination arises when the underlying data are neither independent nor strongly mixed but instead…
In statistical inference, confidence set procedures are typically evaluated based on their validity and width properties. Even when procedures achieve rate-optimal widths, confidence sets can still be excessively wide in practice due to…
We revisit the classical broken sample problem: Two samples of i.i.d. data points $\mathbf{X}=\{X_1,\cdots, X_n\}$ and $\mathbf{Y}=\{Y_1,\cdots,Y_m\}$ are observed without correspondence with $m\leq n$. Under the null hypothesis,…
We consider statistical inference under a semi-supervised setting where we have access to both a labeled dataset consisting of pairs $\{X_i, Y_i \}_{i=1}^n$ and an unlabeled dataset $\{ X_i \}_{i=n+1}^{n+N}$. We ask the question: under what…
This paper studies a Bayesian estimation procedure for single-hidden-layer neural networks using $\ell_{1}$ controlled weights. We study the structure of the posterior density and provide a representation that makes it amenable to rapid…