统计理论
In this paper, we investigate the label shift quantification problem. We propose robust estimators of the label distribution which turn out to coincide with the Maximum Likelihood Estimator. We analyze the theoretical aspects and derive…
In recent years, diffusion models, and more generally score-based deep generative models, have achieved remarkable success in various applications, including image and audio generation. In this paper, we view diffusion models as an implicit…
Given a random sample from a density function supported on a manifold $M$, a new method for the estimating highest density regions of the underlying population is introduced. The new proposal is based on the empirical version of the opening…
This paper considers the asymptotic behavior in $\beta$-H\"older spaces, and under $L^p$ loss, of the gamma kernel density estimator introduced by Chen [Ann. Inst. Statist. Math. 52 (2000), 471-480] for the analysis of nonnegative data,…
The friendship paradox index is a network summary statistic used to quantify the friendship paradox, which describes the tendency for an individual's friends to have more friends than the individual. In this paper, we utilize Markov's…
We study the problem of learning multi-index models (MIMs), where the label depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown $\mathsf{s}$-dimensional projection $\boldsymbol{W}_*^\mathsf{T} \boldsymbol{x} \in…
The broken random sample problem was first introduced by DeGroot, Feder, and Gole (1971, Ann. Math. Statist.): in each observation (batch), a random sample of $M$ i.i.d. point pairs $ ((X_i,Y_i))_{i=1}^M$ is drawn from a joint distribution…
The Gaussian kernel is one of the most important kernels, applicable to many research fields, including scientific computing and data science. In this paper, we present asymptotic analysis of the Gaussian kernel matrix in high dimension…
We study discrete-time, discrete-state multistate Markov models from the perspective of algebraic statistics. These models are widely studied in event history analysis, and are characterized by the state space, the initial distribution and…
Geometric quantiles are popular location functionals to build rank-based statistical procedures in multivariate settings. They are obtained through the minimization of a non-smooth convex objective function. As a result, the singularity of…
We consider the problem of parameter estimation from a generalized linear model with a random design matrix that is orthogonally invariant in law. Such a model allows the design have an arbitrary distribution of singular values and only…
This work is concerned with nonparametric goodness-of-fit testing in the context of nonlinear inverse problems with random observations. Bayesian posterior distributions based upon a Gaussian process prior distribution are proven to…
We study concentration inequalities for structured weighted sums of random data, including (i) tensor inner products and (ii) sequential matrix sums. We are interested in tail bounds and concentration inequalities for those structured…
We consider a problem of covariance estimation from a sample of i.i.d. high-dimensional random vectors. To avoid the curse of dimensionality, we impose an additional assumption on the structure of the covariance matrix $\Sigma$. To be more…
This paper investigates the detection and estimation of a single change in high-dimensional linear models. We derive minimax lower bounds for the detection boundary and the estimation rate, which uncover a phase transition governed by the…
We systematically investigate the preservation of differential privacy in functional data analysis, beginning with functional mean estimation and extending to varying coefficient model estimation. Our work introduces a distributed learning…
In this paper, we prove strong consistency of an estimator by the truncated singular value decomposition for a multivariate errors-in-variables linear regression model with collinearity. This result is an extension of Gleser's proof of the…
We study e-values for quantifying evidence against exchangeability and general invariance of a random variable under a compact group. We start by characterizing such e-values, and explaining how they nest traditional group invariance tests…
The Lasso is one of the most ubiquitous methods for variable selection in high-dimensional linear regression and has been studied extensively under different regimes. In a particular asymptotic setup entailing $n/p\to \text{constant}$, an…
This paper develops a unified framework for asymptotically minimax robust hypothesis testing under distributional uncertainty, applicable to both Bayesian and Neyman--Pearson formulations (Type-I and Type-II). Uncertainty classes based on…