统计理论
In this study, we derive the exact distribution and moment of the noncentral complex Roy's largest root statistic, expressed as a product of complex zonal polynomials. We show that the linearization coefficients arising from the product of…
In the setting of multiple testing, compound p-values generalize p-values by asking for superuniformity to hold only \emph{on average} across all true nulls. We study the properties of the Benjamini--Hochberg procedure applied to compound…
In information geometry, statistical models are considered as differentiable manifolds, where each probability distribution represents a unique point on the manifold. A Riemannian metric can be systematically obtained from a divergence…
We establish a general, non-asymptotic error analysis framework for understanding the effects of incremental approximations made by practical approaches for Bayesian sequential learning (BSL) on their long-term inference performance. Our…
We provide finite-sample distribution approximations, that are uniform in the parameter, for inference in linear mixed models. Focus is on variances and covariances of random effects in cases where existing theory fails because their…
We develop a toolbox for exact analysis of iterative algorithms on a class of high-dimensional nonconvex optimization problems with random data. While prior work has shown that low-dimensional statistics of (generalized) first-order methods…
We assess advantages of expressing tree-structured Ising models via their mean parameterization rather than their commonly chosen canonical parameterization. This includes fixedness of marginal distributions, often convenient for dependence…
The signal plus noise model $H=S+Y$ is a fundamental model in signal detection when a low rank signal $S$ is polluted by noise $Y$. In the high-dimensional setting, one often uses the leading singular values and corresponding singular…
We develop early stopping rules for growing regression tree estimators. The fully data-driven stopping rule is based on monitoring the global residual norm. The best-first search and the breadth-first search algorithms together with linear…
The symmetric binary perceptron ($\mathrm{SBP}_{\kappa}$) problem with parameter $\kappa : \mathbb{R}_{\geq1} \to [0,1]$ is an average-case search problem defined as follows: given a random Gaussian matrix $\mathbf{A} \sim…
We consider sparse matrix estimation where the goal is to estimate an $n\times n$ matrix from noisy observations of a small subset of its entries. We analyze the estimation error of the popularly utilized collaborative filtering algorithm…
Given an i.i.d. sample drawn from some probability distribution on a finite set, the best (in the sense of least variance) linear unbiased estimator (BLUE) of the average of any quantity with respect to that distribution is the sample…
In this paper, we explicitly derive unbiased estimators for various functions of the rate parameter of the exponential distribution in the absence of a location parameter, including powers of the rate parameter, the $q$th quantile, the…
Determinantal point processes (DPPs for short) are a class of repulsive point processes. They have found some statistical applications to model spatial point pattern datasets with repulsion between close points. In the case of DPPs on…
Single-parameter summaries of variable effects in regression settings are desirable for ease of interpretation. However (partially) linear models for example, which would deliver these, may fit poorly to the data. On the other hand, an…
In many applications, particularly in the natural sciences, the available high-dimensional set of features may contain variables that are not correlated with the response under consideration. Such irrelevant features can, in certain cases,…
We develop a criterion to certify whether causal effects are identifiable in linear structural equation models with latent variables. Linear structural equation models correspond to directed graphs whose nodes represent the random variables…
By introducing a weight function into the density power divergence, we develop a new class of robust and smooth estimators for the tail index of Pareto-type distributions, offering improved efficiency in the presence of outliers. These…
For one-parameter continuous exponential families, we identify an unbiased estimator of the inverse of the natural parameter $\theta$ for cases where $\theta > 0$, extending an earlier result of \cite{voinov1985unbiased} applicable to a…
Variational inference is a general framework to obtain approximations to the posterior distribution in a Bayesian context. In essence, variational inference entails an optimization over a given family of probability distributions to choose…