统计理论
We consider testing the goodness-of-fit of a distribution against alternatives separated in sup norm. We study the twin settings of Poisson-generated count data with a large number of categories and high-dimensional multinomials. In…
Sample covariance matrices from multi-population typically exhibit several large spiked eigenvalues, which stem from differences between population means and are crucial for inference on the underlying data structure. This paper…
The normalized maximum likelihood (NML) code length is widely used as a model selection criterion based on the minimum description length principle, where the model with the shortest NML code length is selected. A common method to calculate…
Online averaged stochastic gradient algorithms are more and more studied since (i) they can deal quickly with large sample taking values in high dimensional spaces, (ii) they enable to treat data sequentially, (iii) they are known to be…
The field of quickest change detection (QCD) concerns design and analysis of algorithms to estimate in real time the time at which an important event takes place and identify properties of the post-change behavior. The goal is to devise a…
In descriptive statistics, $U$-statistics arise naturally in producing minimum-variance unbiased estimators. In 1984, Serfling considered the distribution formed by evaluating the kernel of the $U$-statistics and proposed generalized…
Diffusion models are a new class of generative models that revolve around the estimation of the score function associated with a stochastic differential equation. Subsequent to its acquisition, the approximated score function is then…
The relationship between a set of design points and the class of hierarchical polynomial models identifiable from the design is investigated. Saturated models are of particular interest. Necessary and sufficient conditions are derived on…
As the most fundamental problem in statistics, robust location estimation has many prominent solutions, such as the trimmed mean, Winsorized mean, Hodges Lehmann estimator, Huber M estimator, and median of means. Recent studies suggest that…
Obtaining guarantees on the convergence of the minimizers of empirical risks to the ones of the true risk is a fundamental matter in statistical learning. Instead of deriving guarantees on the usual estimation error, the goal of this paper…
Conditional independence and graphical models are well studied for probability distributions on product spaces. We propose a new notion of conditional independence for any measure $\Lambda$ on the punctured Euclidean space $\mathbb…
We conduct a non asymptotic study of the Cross Validation (CV) estimate of the generalization risk for learning algorithms dedicated to extreme regions of the covariates space. In this Extreme Value Analysis context, the risk function…
Understanding the tail behavior of distributions is crucial in statistical theory. For instance, the tail of a distribution plays a ubiquitous role in extreme value statistics, where it is associated with the likelihood of extreme events.…
This paper proposes procedures for testing the equality hypothesis and the proportionality hypothesis involving a large number of $q$ covariance matrices of dimension $p\times p$. Under a limiting scheme where $p$, $q$ and the sample sizes…
Descriptive statistics for parametric models are currently highly sensative to departures, gross errors, and/or random errors. Here, leveraging the structures of parametric distributions and their central moment kernel distributions, a…
We introduce a new framework to analyze shape descriptors that capture the geometric features of an ensemble of point clouds. At the core of our approach is the point of view that the data arises as sampled recordings from a metric…
This paper presents how to perform minimax optimal classification, regression, and density estimation based on fixed-$k$ nearest neighbor (NN) searches. We consider a distributed learning scenario, in which a massive dataset is split into…
Due to the lack of a canonical ordering in ${\mathbb R}^d$ for $d>1$, defining multivariate generalizations of the classical univariate ranks has been a long-standing open problem in statistics. Optimal transport has been shown to offer a…
The wild bootstrap is the resampling method of choice in survival analytic applications. Theoretic justifications rely on the assumption of existing intensity functions which is equivalent to an exclusion of ties among the event times.…
This paper introduces a new version of the smoothly trimmed mean with a more general version of weights, which can be used as an alternative to the classical trimmed mean. We derive its asymptotic variance and to further investigate its…