统计理论
We develop a pseudo-likelihood theory for rank one matrix estimation problems in the high dimensional limit. We prove a variational principle for the limiting pseudo-maximum likelihood which also characterizes the performance of the…
We consider the Gaussian kernel density estimator with bandwidth $\beta^{-\frac12}$ of $n$ iid Gaussian samples. Using the Kac-Rice formula and an Edgeworth expansion, we prove that the expected number of modes on the real line scales as…
Random forests are popular methods for regression and classification analysis, and many different variants have been proposed in recent years. One interesting example is the Mondrian random forest, in which the underlying constituent trees…
The main objective of this paper is to estimate optimally Sobol' indices at any order when a unique input/output i.i.d.\ sample is available. Our approach stands on three main ingredients: semi-parametric estimation theory, high-order…
Theoretical guarantees are established for a standard estimator in a semi-parametric finite mixture model, where each component density is modeled as a product of univariate densities under a conditional independence assumption. The focus…
In this paper, we study the problem of finding a collection of planted cycles in an \ER random graph $G \sim \mathcal{G}(n, \lambda/n)$, in analogy to the famous Planted Clique Problem. When the cycles are planted on a uniformly random…
This work investigates the nonparametric estimation of the vector field of a noisy Ordinary Differential Equation (ODE) in high-dimensional ambient spaces, under the assumption that the initial conditions are sampled from a…
We establish non-asymptotic error bounds for the classical Maximal Likelihood Estimation of the transition matrix of a given Markov chain. Meanwhile, in the reversible case, we propose a new reversibility-preserving online Symmetric…
Designing efficient and rigorous numerical methods for sequential decision-making under uncertainty is a difficult problem that arises in many applications frameworks. In this paper we focus on the numerical solution of a subclass of…
We consider the problem of estimating curvature where the data can be viewed as a noisy sample from an underlying manifold. For manifolds of dimension greater than one there are multiple definitions of local curvature, each suggesting a…
The present work introduces curvature-based rejection sampling (CURS). This is a method for sampling from a general class of probability densities defined on Riemannian manifolds. It can be used to sample from any probability density which…
This paper investigates the theoretical properties of Dirichlet kernel density estimators for compositional data supported on simplices, for the first time addressing scenarios involving time-dependent observations characterized by strong…
In this paper we first introduce the setting of filtering on Stiefel manifolds. Then, assuming the underlying system process is constant, the convergence of the extended Kalman filter with Stiefel manifold-valued observations is proved.…
This paper investigates global and local laws for sample covariance matrices with general growth rates of dimensions. The sample size $N$ and population dimension $M$ can have the same order in logarithm, which implies that their ratio…
All Resolutions Inference (ARI) is a post hoc inference method for functional Magnetic Resonance Imaging (fMRI) data analysis that provides valid lower bounds on the proportion of truly active voxels within any, possibly data-driven,…
Score-based Generative Models (SGMs) have achieved impressive performance in data generation across a wide range of applications and benefit from strong theoretical guarantees. Recently, methods inspired by statistical mechanics, in…
Early intervention in neurodegenerative diseases requires identifying periods before diagnosis when decline is rapid enough to detect whether a therapy is slowing progression. Since rapid decline typically occurs close to diagnosis,…
The problem of optimal linear estimation of functionals depending on the unknown values of a random field $\zeta(t,x)$, which is mean-square continuous periodically correlated with respect to time argument $t\in\mathbb R$ and isotropic on…
System identification of autoregressive processes on Stiefel and Grassmann manifolds are presented and studied. We define the system parameters as elements in the orthogonal group and we show that the system can be estimated by averaging…
This work studies the statistical implications of using features comprised of general linear combinations of covariates to partition the data in randomized decision tree and forest regression algorithms. Using random tessellation theory in…