统计理论 — Scifaro

Riesz representers for the rest of us

The application of semiparametric efficient estimators, particularly those that leverage machine learning, is rapidly expanding within epidemiology and causal inference. This literature is increasingly invoking the Riesz representation…

统计理论 · 数学 2026-01-13 Nicholas T. Williams , Oliver J. Hines , Kara E. Rudolph

Asymptotically well-calibrated Bayesian $p$-value using the Kolmogorov-Smirnov statistic

The posterior predictive $p$-value (ppp) is widely used in Bayesian model evaluation. However, due to double use of the data, the ppp may not be a valid $p$-value even in large samples: The asymptotic null distribution of the ppp can be…

统计理论 · 数学 2026-01-13 Yueming Shen , Surya Tokdar

Calibration Bands for Mean Estimates within the Exponential Dispersion Family

A statistical model is said to be calibrated if the resulting mean estimates perfectly match the true means of the underlying responses. Aiming for calibration is often not achievable in practice as one has to deal with finite samples of…

统计理论 · 数学 2026-01-13 Łukasz Delong , Selim Gatti , Mario V. Wüthrich

Fixed-strength spherical designs

A spherical $t$-design is a finite subset $X$ of the unit sphere such that every polynomial of degree at most $t$ has the same average over $X$ as it does over the entire sphere. Determining the minimum possible size of spherical designs,…

统计理论 · 数学 2026-01-13 Travis Dillon

Improved performance guarantees for Tukey's median

Is there a natural way to order data in dimension greater than one? The approach based on the notion of data depth, often associated with John Tukey, is among the most popular. Tukey's depth has found applications in robust statistics,…

统计理论 · 数学 2026-01-13 Stanislav Minsker , Yinan Shen

Directional testing for one-way MANOVA in divergent dimensions

Testing the equality of mean vectors across $g$ different groups plays an important role in many scientific fields. In regular frameworks, likelihood-based statistics under the normality assumption offer a general solution to this task.…

统计理论 · 数学 2026-01-13 Caizhu Huang , Claudia Di Caterina , Nicola Sartori

On the Effect of Misspecifying the Embedding Dimension in Low-rank Network Models

As network data has become ubiquitous in the sciences, there has been growing interest in network models whose structure is driven by latent node-level variables in a (typically low-dimensional) latent geometric space. These "latent…

统计理论 · 数学 2026-01-12 Roddy Taing , Keith Levin

Detecting Planted Structure in Circular Data

Hypothesis testing problems for circular data are formulated, where observations take values on the unit circle and may contain a hidden, phase-coherent structure. Under the null, the data are independent uniform on the unit circle; under…

统计理论 · 数学 2026-01-12 Taha Ameen , Bruce Hajek

What Functions Does XGBoost Learn?

This paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understanding. We introduce an infinite-dimensional function…

统计理论 · 数学 2026-01-12 Dohyeong Ki , Adityanand Guntuboyina

Drift estimation for a partially observed mixed fractional Ornstein--Uhlenbeck process

We consider estimation of the drift parameter $\vartheta>0$ in a \emph{partially observed} Ornstein--Uhlenbeck type model driven by a mixed fractional Brownian noise. Our framework extends the partially observed model of…

统计理论 · 数学 2026-01-12 Chunhao Cai

Probabilistic Analysis of Scalogram Ridges in Signal Processing

While ridges in the scalogram, determined by the squared modulus of analytic wavelet transform (AWT), is a widely accepted concept and utilized in nonstationary time series analysis, their behavior in noisy environments remains…

统计理论 · 数学 2026-01-12 Gi-Ren Liu , Yuan-Chung Sheu , Hau-Tieng Wu

Measure estimation on a manifold explored by a diffusion process

From the observation of a diffusion path $(X_t)_{t\in [0,T]}$ on a compact connected $d$-dimensional manifold $\mathcal{M}$ without boundary, we consider the problem of estimating the stationary measure $\mu$ of the process. Wang and Zhu…

统计理论 · 数学 2026-01-12 Vincent Divol , Hélène Guérin , Dinh-Toan Nguyen , Viet Chi Tran

Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models

In a mixed generalized linear model, the goal is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider the prototypical problem of estimating two…

统计理论 · 数学 2026-01-12 Yihan Zhang , Marco Mondelli , Ramji Venkataramanan

Convergence Rates for Learning Pseudo-Differential Operators

This paper establishes convergence rates for learning elliptic pseudo-differential operators, a fundamental operator class in partial differential equations and mathematical physics. In a wavelet-Galerkin framework, we formulate learning…

统计理论 · 数学 2026-01-09 Jiaheng Chen , Daniel Sanz-Alonso

Sharp Non-Asymptotic Bounds for the Star Discrepancy of Double-Infinite Random Matrices via Optimal Covering Numbers

We establish sharp non-asymptotic probabilistic bounds for the star discrepancy of double-infinite random matrices -- a canonical model for sequences of random point sets in high dimensions. By integrating the recently proved…

统计理论 · 数学 2026-01-09 Xiaoda Xu , Jun Xian

Expected star discrepancy based on stratified sampling

We present two main contributions to the expected star discrepancy theory. First, we derive a sharper expected upper bound for jittered sampling, improving the leading constants and logarithmic terms compared to the state-of-the-art [Doerr,…

统计理论 · 数学 2026-01-09 Xiaoda Xu , Jun Xian

The Structure of Cross-Validation Error: Stability, Covariance, and Minimax Limits

Despite ongoing theoretical research on cross-validation (CV), many theoretical questions remain widely open. This motivates our investigation into how properties of algorithm-distribution pairs can affect the choice for the number of folds…

统计理论 · 数学 2026-01-09 Ido Nachum , Rüdiger Urbanke , Thomas Weinberger

Robust estimation with latin hypercube sampling: a central limit theorem for Z-estimators

Latin hypercube sampling (LHS) is a widely used stratified sampling method in computer experiments. In this work, we extend the existing convergence results for the sample mean under LHS to the broader class of $Z$-estimators, estimators…

统计理论 · 数学 2026-01-09 Faouzi Hakimi

A PAC-Bayes oracle inequality for sparse neural networks

We study the Gibbs posterior distribution for sparse deep neural nets in a nonparametric regression setting. The posterior can be accessed via Metropolis-adjusted Langevin algorithms. Using a mixture over uniform priors on sparse sets of…

统计理论 · 数学 2026-01-09 Maximilian F. Steffen , Mathias Trabs

Adaptive thresholding for wavelet-based nonparametric heteroskedastic variance estimation on the sphere

This paper investigates the nonparametric estimation of a heteroskedastic variance function on the sphere in a regression framework, assuming the variance belongs to a Besov regularity class. A needlet-based estimator is proposed, combining…

统计理论 · 数学 2026-01-08 Claudio Durastanti , Radomyra Shevchenko