统计理论
Strict minimum message length (SMML) is an information-theoretic coding principle that represents a continuous statistical model by a finite set of assertions and a partition of the sample space. We show that the SMML objective decomposes…
Singular learning theory characterizes Bayesian models with non-identifiable parameterizations through two central quantities: the real log canonical threshold (RLCT), which governs marginal likelihood asymptotics, and the singular…
We study parallel sampling from high-dimensional strongly log-concave distributions. Langevin-based samplers converge rapidly in continuous time, but their discretizations are typically sequential and often require polynomially many steps…
Graphical LASSO (GLASSO) is a widely used method for estimating sparse precision matrices and learning undirected graphical models in high-dimensional settings. Because GLASSO penalizes entries of the precision matrix directly, however, it…
The concept of statistical depth extends the notions of the median and quantiles to other statistical models. These procedures aim to formalize the idea of identifying deeply embedded fits to a model that are less influenced by…
Spectral methods have myriad applications in high-dimensional statistics and data science, and while previous works have primarily focused on $\ell_2$ or $\ell_{2,\infty}$ eigenvector and singular vector perturbation theory, in many…
Anytime-valid tests allow evidence to be checked during data collection: one can either continue testing or stop and reject the null while still controlling type-I error. Yet, in many applications rejection is useful only if it comes soon…
We study the minimax estimation of covariance eigenfunctions and eigenvalues in functional principal component analysis when $n$ trajectories are observed at $p$ common grid points with additive noise. We consider covariance kernels with…
Langevin sampling from distributions of the form $p(x) \propto \exp(-\Psi(x))$ faces two major challenges: (global) mode coverage and (local) mode exploration. The first challenge is particularly relevant for multi-modal distributions with…
In this paper, we investigate the supremum-norm generalization error and the uniform inference for a specific class of kernel regression methods, namely the kernel gradient flows. Under the widely adopted capacity-source condition framework…
We study nonparametric estimation of Schr\"odinger bridge (SB) drifts from i.i.d.\ data observed on a single time interval. Starting from the conditional-ratio form of the Schr\"odinger bridge time-series (SBTS) drift formula, we analyze a…
Given $n$ i.i.d. samples from an unknown discrete distribution over an unknown set, the unseen species problem is to predict how many new outcomes would be observed in $m$ additional samples. For small $m$ we show that the Good-Toulmin…
We investigate the problem of density estimation on the unit circle and the unit sphere from a computational perspective. Our primary goal is to develop new density estimators that are both rate-optimal and computationally efficient for…
We consider the problem of sequential (online) estimation of a single change point in a piecewise linear regression model under a Gaussian setup. We demonstrate that certain CUSUM-type statistics attain the minimax optimal rates for…
We investigate the monotone representation and measurability of generalized $\psi$-estimators introduced by the authors in 2022. Our first main result, applying the unique existence of a generalized $\psi$-estimator, allows us to construct…
We consider the problem of testing the mean of a bounded real random variable. We introduce a notion of optimal classes for e-variables and e-processes, and establish the optimality of the coin-betting formulation among e-variable-based…
Despite the popularity and practical success of total variation (TV) regularization for function estimation, surprisingly little is known about its theoretical performance in a statistical setting. While TV regularization has been known for…
We study the relationship between measures of non-exchangeability $\mu_p$ ($p\in[1,+\infty]$), in the sense of Durante et al. (2010), and classical dependence functionals for bivariate copulas. We show that the symmetrization…
Quantile shares, introduced by Babichenko, Feldman, Holzman, and Narayan [STOC 2024], offer an ordinal, self-maximizing, and interpretable benchmark for fair division of indivisible goods, but their universal feasibility is known only…
Estimating unknown parameters subject to prior constraints is important in statistical inference, particularly in fields such as reliability analysis, survival studies, and engineering, where prior structural information about the…