统计理论
Let $\pi_1$ and $\pi_2$ be two independent populations, where the population $\pi_i$ follows a bivariate normal distribution with unknown mean vector $\boldsymbol{\theta}^{(i)}$ and common known variance-covariance matrix $\Sigma$, $i=1,2$.…
In this article, inferences about the multicomponent stress strength reliability are drawn under the assumption that strength and stress follow independent Pareto distribution with different shapes $(\alpha_1,\alpha_2)$ and common scale…
Persistent homology is a tool from Topological Data Analysis (TDA) used to summarize the topology underlying data. It can be conveniently represented through persistence diagrams. Observing a noisy signal, common strategies to infer its…
As is the case for many curved exponential families, the computation of maximum likelihood estimates in a multivariate normal model with a Kronecker covariance structure is typically carried out with an iterative algorithm, specifically, a…
We study a parametric family of latent variable models, namely topic models, equipped with a hierarchical structure among the topic variables. Such models may be viewed as a finite mixture of the latent Dirichlet allocation (LDA) induced…
We derive novel anti-concentration bounds for the difference between the maximal values of two Gaussian random vectors across various settings. Our bounds are dimension-free, scaling with the dimension of the Gaussian vectors only through…
Estimating parameters of functional ARMA, GARCH and invertible processes requires estimating lagged covariance and cross-covariance operators of Cartesian product Hilbert space-valued processes. Asymptotic results have been derived in…
We consider the problem of exact community recovery in the Labeled Stochastic Block Model (LSBM) with $k$ communities, where each pair of vertices is associated with a label from the set $\{0,1, \dots, L\}$. A pair of vertices from…
Recent advances have clarified theoretical learning accuracy in Bayesian inference, revealing that the asymptotic behavior of metrics such as generalization loss and free energy, assessing predictive accuracy, is dictated by a rational…
We adress the problem of consistency of the $k$-nearest neighbors kernel estimators of the density and the regression function in the multivariate case. We get the rates of strong uniform consistency on the whole space $\mathbb{R}^p$ for…
This paper extends the idea of a generalized estimator for a scalar parameter (Vos, 2022) to multi-dimensional parameters both with and without nuisance parameters. The title reflects the fact that generalized estimators provide more than…
We propose center-outward superquantile and expected shortfall functions, with applications to multivariate risk measurements, extending the standard notion of value at risk and conditional value at risk from the real line to…
Sampling from the posterior is a key technical problem in Bayesian statistics. Rigorous guarantees are difficult to obtain for Markov Chain Monte Carlo algorithms of common use. In this paper, we study an alternative class of algorithms…
Split conformal prediction (CP) is arguably the most popular CP method for uncertainty quantification, enjoying both academic interest and widespread deployment. However, the original theoretical analysis of split CP makes the crucial…
The Gaussian correlation inequality (GCI) for symmetrical n-rectangles is improved if the absolute components have a joint cumulative distribution (cdf) which is MTP2 (multivariate totally positive of order 2). Inequalities of the here…
This paper studies a factor modeling-based approach for clustering high-dimensional data generated from a mixture of strongly correlated variables. Statistical modeling with correlated structures pervades modern applications in economics,…
Multivariate elliptically-contoured distributions are widely used for modeling correlated and non-Gaussian data. In this work, we study the kurtosis of the elliptical model, which is an important parameter in many statistical analysis.…
The empirical Wasserstein projection (WP) distance quantifies the Wasserstein distance from the empirical distribution to a set of probability measures satisfying given expectation constraints. The WP is a powerful tool because it mitigates…
Cross-Validation (CV) is the default choice for evaluating the performance of machine learning models. Despite its wide usage, their statistical benefits have remained half-understood, especially in challenging nonparametric regimes. In…
Differential privacy (DP) is a class of mathematical standards for assessing the privacy provided by a data-release mechanism. This work concerns two important flavors of DP that are related yet conceptually distinct: pure…