统计理论
We develop asymptotic theory for principal component analysis (PCA) of a high-dimensional factor model in which the working dimension $R$ is fixed and only required to satisfy $R \ge r$, where $r$ is the true number of factors. Building on…
In this study, we consider sequences drawn from time-homogeneous Markov chains and introduce a novel approach for estimating first hitting-time distributions to specified terminal states. Our method- ology is based on the…
This article presents a theoretical study of uncertainty functionals on general measurable spaces. These functionals are fundamental in experimental design and global sensitivity analysis, where they are used to quantify variability and…
Self-distillation has emerged as a promising technique for improving model performance in modern machine learning systems. We develop the statistical foundations of self-distillation in spiked covariance models, by introducing and analyzing…
This paper extends a recently proposed family of EDF-based goodness-of-fit procedures for the hypercube $[0,1]^p$ - the m-test and the s-test - which are based on a unique deconstruction of the $p$-parameter Brownian sheet into independent…
This article proposes a new index for quantifying the degree of dependence between random vectors. The index takes values in [0,1] and equals zero if and only if the random vectors are sub-independent. Unlike mere uncorrelatedness,…
Survival analysis is widely used in applications involving sensitive individual-level data, yet differentially private hypothesis testing for right-censored data remains largely undeveloped. We initiate a finite-sample theory of private…
Machine learning systems increasingly face requirements to forget not only individual data points, but entire domains of information, such as toxic language, copyrighted corpora, or demographic biases. This raises a fundamental dilemma of…
We study asymptotic anytime-valid confidence sequences for degree-two U-statistics under continuous monitoring. In the nondegenerate case, Hoeffding's projection reduces the problem to a time-uniform central limit theory for the partial…
We introduce a novel approach to finite sample robustness that avoids the pessimism of traditional breakdown analyses. We define the threshold breakdown point, the smallest contamination fraction needed to induce a prescribed deviation, and…
The choice of the tuning parameter in the Lasso is central to its statistical performance in high-dimensional linear regression. In this work, we study tuning regimes under which the Lasso exhibits suboptimal prediction performance, in the…
Unlinked regression, in which covariates and responses are observed separately without known correspondence, has recently gained increasing attention. Deconvolution, on the other hand, is a fundamental and challenging problem in…
This paper develops a Catoni-type joint (tuning-free) estimation framework for parametric models with heavy-tailed noise, in which the target parameter and the unknown noise variance are estimated simultaneously through a system of two…
Wasserstein metrics are increasingly being used as similarity scores for images treated as discrete measures on a grid, yet their behavior under noise remains poorly understood. In this work, we consider the sensitivity of the signed…
The Koml\'os$\unicode{x2013}$Major$\unicode{x2013}$Tusn\'ady (KMT) inequality for partial sums is one of the most celebrated results in probability theory. Yet its practical application has been hindered by a lack of practical constants.…
We consider the problem of testing the mean of high-dimensional data when the dimension may grow without explicit rate restrictions relative to the sample size. The proposed procedure is based on the statistic V_n = n||Xn||^2, which avoids…
Expectations of multivariate functions with missing labels occur in various fields such as transfer learning and average treatment effects. Although non-parametric estimators based on nearest-neighbour matching are frequently used in this…
We study the classical problem of community recovery in stochastic block models with a fixed number of communities, with a twist: We seek algorithms that are stable with respect to node-wise changes in the graph structure, formally defined…
We introduce a class of L\'evy-driven graph Ornstein-Uhlenbeck (grOU) models for edge-indexed network time series. The proposed framework extends generalized network autoregressive (GNAR) processes for edge-indexed network time series to…
Suppose we have an observed path from a point process counting event occurrences in a large population. Based on the observed path, we would like to test the null hypothesis that the conditional intensity of the point process belongs to a…