统计理论 — Scifaro

Finite-sample performance of the maximum likelihood estimator in logistic regression

Logistic regression is a classical model for describing the probabilistic dependence of binary responses to multivariate covariates. We consider the predictive performance of the maximum likelihood estimator (MLE) for logistic regression,…

统计理论 · 数学 2026-02-20 Hugo Chardon , Matthieu Lerasle , Jaouad Mourtada

M\"obius inversion and the iterated bootstrap

Estimating nonlinear functionals of probability distributions from samples is a fundamental statistical problem. The "plug-in" estimator obtained by applying the target functional to the empirical distribution of samples is biased.…

统计理论 · 数学 2026-02-20 Florian Schäfer

On Sharpened Convergence Rate of Generalized Sliced Inverse Regression for Nonlinear Sufficient Dimension Reduction

Generalized Sliced Inverse Regression (GSIR) is one of the most important methods for nonlinear sufficient dimension reduction. As shown in Li and Song (2017), it enjoys a convergence rate that is independent of the dimension of the…

统计理论 · 数学 2026-02-19 Chak Fung Choi , Yin Tang , Bing Li

Separating Oblivious and Adaptive Models of Variable Selection

Sparse recovery is among the most well-studied problems in learning theory and high-dimensional statistics. In this work, we investigate the statistical and computational landscapes of sparse recovery with $\ell_\infty$ error guarantees.…

统计理论 · 数学 2026-02-19 Ziyun Chen , Jerry Li , Kevin Tian , Yusong Zhu

Estimation of Conformal Metrics

We study deformations of the geodesic distances on a domain of R N induced by a function called conformal factor. We show that under a positive reach assumption on the domain (not necessarily a submanifold) and mild assumptions on the…

统计理论 · 数学 2026-02-19 Jérôme Taupin

Orthogonal parametrisations of Extreme-Value distributions

Extreme value distributions are routinely employed to assess risks connected to extreme events in a large number of applications. They typically are two- or three- parameter distributions: the inference can be unstable, which is…

统计理论 · 数学 2026-02-19 Nathan Huet , Ilaria Prosdocimi

HAL-MLE Log-Splines Density Estimation (Part I: Univariate)

We study nonparametric maximum likelihood estimation of probability densities under a total variation (TV) type penalty, sectional variation norm (also named as Hardy-Krause variation). TV regularization has a long history in regression and…

统计理论 · 数学 2026-02-19 Yilong Hou , Zhengpu Zhao , Yi Li , Mark van der Laan

Nonparametric estimation of linear multiplier for processes driven by a Hermite process

We study the problem of nonparametric estimation of the linear multiplier function $\theta(t)$ for processes satisfying stochastic differential equations of the type $$dX_t=\theta(t) X_tdt+ \epsilon dZ^{q,H}_t, X_0=x_0, 0\leq t \leq T$$…

统计理论 · 数学 2026-02-19 B. L. S. Prakasa Rao

Incomplete U-Statistics of Equireplicate Designs: Berry-Esseen Bound and Efficient Construction

U-statistics are a fundamental class of estimators that generalize the sample mean and underpin much of nonparametric statistics. Although extensively studied in both statistics and probability, key challenges remain: their high…

统计理论 · 数学 2026-02-19 Cesare Miglioli , Jordan Awan

On the distance between mean and geometric median in high dimensions

The geometric median, a notion of center for multivariate distributions, has gained recent attention in robust statistics and machine learning. Although conceptually distinct from the mean (i.e., expectation), we demonstrate that both are…

统计理论 · 数学 2026-02-19 Richard Schwank , Mathias Drton

Large-sample analysis of cost functionals for inference under the coalescent

The coalescent is a foundational model of latent genealogical trees under neutral evolution, but suffers from intractable sampling probabilities. Methods for approximating these sampling probabilities either introduce bias or fail to scale…

统计理论 · 数学 2026-02-19 Martina Favero , Jere Koskela

Asymptotics for conformal inference

Conformal inference is a versatile tool for building prediction sets in regression or classification. We study the false coverage proportion (FCP) in a simultaneous inference setting with a calibration sample of $n$ points and a test sample…

统计理论 · 数学 2026-02-19 Ulysse Gazin

Adjusted Scores for Discrete Langevin Algorithms

Sampling from discrete distributions is a ubiquitous task in machine learning, recently revisited by the emergence of discrete diffusion models. While Langevin algorithms constitute the state of the art for continuous spaces, discrete…

统计理论 · 数学 2026-02-18 Armand Gissler , Saeed Saremi , Francis Bach

Optimal detection of planted stars via a random energy model

We study the problem of detecting a planted star in the Erd{\H{o}}s--R{\'e}nyi random graph $G(n,m)$, formulated as a hypothesis test. We determine the scaling window for critical detection in $m$ in terms of the star size, and characterize…

统计理论 · 数学 2026-02-18 Ijay Narang , Will Perkins , Timothy L. H. Wee

Bayes Risk for Goodness of Fit Tests

We develop a unified framework for goodness-of-fit (GOF) testing through the lens of Bayes risk. Classical GOF procedures are commonly calibrated either at fixed significance level (CLT scale) or through exponential error exponents (LDP…

统计理论 · 数学 2026-02-18 Nicholas G. Polson , Vadim Sokolov , Daniel Zantedeschi

Theoretical guarantees for change localization using conformal p-values

Changepoint localization aims to provide confidence sets for a changepoint (if one exists). Existing methods either relying on strong parametric assumptions or providing only asymptotic guarantees or focusing on a particular kind of…

统计理论 · 数学 2026-02-18 Swapnaneel Bhattacharyya , Aaditya Ramdas

One Step to Efficient Synthetic Data

A common approach to synthetic data is to sample from a fitted model. We show that under general assumptions, this approach results in a sample with inefficient estimators and whose joint distribution is inconsistent with the true…

统计理论 · 数学 2026-02-18 Jordan Awan , Zhanrui Cai

Quasi-Bayes properties of a recursive procedure for mixtures

Bayesian methods are often optimal, yet increasing pressure for fast computations, especially with streaming data, brings renewed interest in faster, possibly sub-optimal, solutions. The extent to which these algorithms approximate Bayesian…

统计理论 · 数学 2026-02-18 Sandra Fortini , Sonia Petrone

Topological trivialization in non-convex empirical risk minimization

Given data $\{({\boldsymbol x}_i,y_i): i\le n\}$, with ${\boldsymbol x}_i$ standard $d$-dimensional Gaussian feature vectors, and $y_i\in{\mathbb R}$ response variables, we study the general problem of learning a model parametrized by…

统计理论 · 数学 2026-02-17 Andrea Montanari , Basil Saeed

Frequentist Regret Analysis of Gaussian Process Thompson Sampling via Fractional Posteriors

We study Gaussian Process Thompson Sampling (GP-TS) for sequential decision-making over compact, continuous action spaces and provide a frequentist regret analysis based on fractional Gaussian process posteriors, without relying on domain…

统计理论 · 数学 2026-02-17 Somjit Roy , Prateek Jaiswal , Anirban Bhattacharya , Debdeep Pati , Bani K. Mallick