统计理论
We study the expectations of some ratio-type estimators under the gamma distribution. Expectations of ratio-type estimators are often difficult to compute due to the nature that they are constructed by combining two separate estimators.…
Entropic optimal transport -- the optimal transport problem regularized by KL diver\-gence -- is highly successful in statistical applications. Thanks to the smoothness of the entropic coupling, its sample complexity avoids the curse of…
For a broad class of nonlinear time series known as Bernoulli shifts, we establish the asymptotic normality of the smoothed periodogram estimator of the long-run variance. This estimator uses only a narrow band of Fourier frequencies around…
Recently, there has been substantial interest in statistical guarantees for cross-validation (CV) methods of uncertainty quantification in statistical learning (cf. Barber et al. 2021a, Liang and Barber 2024, Steinberger and Leeb 2023).…
This work studies nonparametric Bayesian estimation of the intensity function of an inhomogeneous Poisson point process in the important case where the intensity depends on covariates, based on the observation of a single realisation of the…
The Plackett--Luce model has been extensively used for rank aggregation in social choice theory. A central statistical question in this model concerns estimating the utility vector that governs the model's likelihood. In this paper, we…
Deep neural networks (DNNs) have emerged as a powerful methodology with significant practical successes in fields such as computer vision and natural language processing. Recent works have demonstrated that sparsely connected DNNs with…
This work proposes new estimators for discrete optimal transport plans that enjoy Gaussian limits centered at the true solution. This behavior stands in stark contrast with the performance of existing estimators, including those based on…
We introduce principal curves in Wasserstein space, and in general compact metric spaces. Our motivation for the Wasserstein case comes from optimal-transport-based trajectory inference, where a developing population of cells traces out a…
Doubly robust estimators with cross-fitting have gained popularity in causal inference due to their favorable structure-agnostic error guarantees. However, when additional structure, such as H\"{o}lder smoothness, is available then more…
We study the functional linear regression model with a scalar response and a Hilbert space-valued predictor, a canonical example of an ill-posed inverse problem. We show that the functional partial least squares (PLS) estimator attains…
The $\lambda$-exponential family generalizes the standard exponential family via a generalized convex duality motivated by optimal transport. It is the constant-curvature analogue of the exponential family from the information-geometric…
We rigorously analyse fully-trained neural networks of arbitrary depth in the Bayesian optimal setting in the so-called proportional scaling regime where the number of training samples and width of the input and all inner layers diverge…
Robust estimation of location is a fundamental problem in statistics, particularly in scenarios where data contamination by outliers or model misspecification is a concern. In univariate settings, methods such as the sample median and…
In this paper, we consider the estimation of regression coefficients and signal-to-noise (SNR) ratio in high-dimensional Generalized Linear Models (GLMs), and explore their implications in inferring popular estimands such as average…
Tests based on heteroskedasticity robust standard errors are an important technique in econometric practice. Choosing the right critical value, however, is not simple at all: conventional critical values based on asymptotics often lead to…
The hierarchical Dirichlet process is the cornerstone of Bayesian nonparametric multilevel models. Its generative model can be described through a set of latent variables, commonly referred to as tables within the popular restaurant…
Model averaging (MA) and ensembling play a crucial role in statistical and machine learning practice. When multiple candidate models are considered, MA techniques can be used to weight and combine them, often resulting in improved…
We address the problem of estimating multiple modes of a multivariate density using persistent homology, a central tool in Topological Data Analysis. We introduce a method based on the preliminary estimation of the $H_0$-persistence diagram…
We derive Gaussian approximation bounds for $k$-Potential Nearest Neighbor ($k$-PNN) based random forest predictions based on a set of training points given by a Poisson process under fairly mild regularity assumptions on the data…