统计理论
The logistic regression estimator is known to inflate the magnitude of its coefficients if the sample size $n$ is small, the dimension $p$ is (moderately) large or the signal-to-noise ratio $1/\sigma$ is large (probabilities of observing a…
We study Langevin-type algorithms for sampling from Gibbs distributions such that the potentials are dissipative and their weak gradients have finite moduli of continuity not necessarily convergent to zero. Our main result is a…
Optimal transportation theory and the related $p$-Wasserstein distance ($W_p$, $p\geq 1$) are widely-applied in statistics and machine learning. In spite of their popularity, inference based on these tools has some issues. For instance, it…
We study Langevin dynamics for recovering the planted signal in the spiked matrix model. We provide a "path-wise" characterization of the overlap between the output of the Langevin algorithm and the planted signal. This overlap is…
We investigate the existence of a fundamental computation-information gap for the problem of clustering a mixture of isotropic Gaussian in the high-dimensional regime, where the ambient dimension $p$ is larger than the number $n$ of points.…
This paper proposes various nonparametric tools based on measure transportation for directional data. We use optimal transports to define new notions of distribution and quantile functions on the hypersphere, with meaningful quantile…
This paper generalizes the notion of sufficiency for estimation problems beyond maximum likelihood. In particular, we consider estimation problems based on Jones et al. and Basu et al. likelihood functions that are popular among…
This paper presents a new methodology for generating continuous statistical distributions, integrating the exponentiated odds ratio within the framework of survival analysis. This new method enhances the flexibility and adaptability of…
Stochastic Approximation (SA) was introduced in the early 1950's and has been an active area of research for several decades. While the initial focus was on statistical questions, it was seen to have applications to signal processing,…
Both the Bayes factor and the relative belief ratio satisfy the principle of evidence and so can be seen to be valid measures of statistical evidence. Certainly Bayes factors are regularly employed. The question then is: which of these…
Clustering is a usual unsupervised machine learning technique for grouping the data points into groups based upon similar features. We focus here on unsupervised clustering for contaminated data, i.e in the case where K-medians should be…
We analyze the statistical properties of generalized cross-validation (GCV) and leave-one-out cross-validation (LOOCV) applied to early-stopped gradient descent (GD) in high-dimensional least squares regression. We prove that GCV is…
Quantifying uncertainty in high-dimensional sparse linear regression is a fundamental task in statistics that arises in various applications. One of the most successful methods for quantifying uncertainty is the debiased LASSO, which has a…
Given i.i.d. observations uniformly distributed on a closed manifold $\mathcal{M}\subseteq \mathbb{R}^p$, we study the spectral properties of the associated empirical graph Laplacian based on a Gaussian kernel. Our main results are…
These are lecture notes of the 51st Saint-Flour summer school, July 2023, on the topic of Bayesian nonparametric statistics
This work contains the mathematical exploration of a few prototypical games in which central concepts from statistics and probability theory naturally emerge. The first two kinds of games are termed Fisher and Bayesian games, which are…
We consider the class of Erlang mixtures for the task of density estimation on the positive real line when the only available information is given as local moments, a histogram with potentially higher order moments in some bins. By…
Quantile regression is increasingly encountered in modern big data applications due to its robustness and flexibility. We consider the scenario of learning the conditional quantiles of a specific target population when the available data…
We consider Bayesian analysis on high-dimensional spheres with angular central Gaussian priors. These priors model antipodally symmetric directional data, are easily defined in Hilbert spaces and occur, for instance, in Bayesian binary…
The generalized gamma convolutions class of distributions appeared in Thorin's work while looking for the infinite divisibility of the log-Normal and Pareto distributions. Although these distributions have been extensively studied in the…