统计理论 — Scifaro

Locally sharp goodness-of-fit testing in sup norm for high-dimensional counts

We consider testing the goodness-of-fit of a distribution against alternatives separated in sup norm. We study the twin settings of Poisson-generated count data with a large number of categories and high-dimensional multinomials. In…

统计理论 · 数学 2024-09-16 Subhodh Kotekal , Julien Chhor , Chao Gao

On spiked eigenvalues of a renormalized sample covariance matrix from multi-population

Sample covariance matrices from multi-population typically exhibit several large spiked eigenvalues, which stem from differences between population means and are crucial for inference on the underlying data structure. This paper…

统计理论 · 数学 2024-09-16 Weiming Li , Zeng Li , Junpeng Zhu

Foundation of Calculating Normalized Maximum Likelihood for Continuous Probability Models

The normalized maximum likelihood (NML) code length is widely used as a model selection criterion based on the minimum description length principle, where the model with the shortest NML code length is selected. A common method to calculate…

统计理论 · 数学 2024-09-16 Atsushi Suzuki , Kota Fukuzawa , Kenji Yamanishi

Convergence in quadratic mean of averaged stochastic gradient algorithms without strong convexity nor bounded gradient

Online averaged stochastic gradient algorithms are more and more studied since (i) they can deal quickly with large sample taking values in high dimensional spaces, (ii) they enable to treat data sequentially, (iii) they are known to be…

统计理论 · 数学 2024-09-16 Antoine Godichon-Baggioni

Quickest Change Detection Using Mismatched CUSUM

The field of quickest change detection (QCD) concerns design and analysis of algorithms to estimate in real time the time at which an important event takes place and identify properties of the post-change behavior. The goal is to devise a…

统计理论 · 数学 2024-09-13 Austin Cooper , Sean Meyn

Robust estimations from distribution structures: II. Central Moments

In descriptive statistics, $U$-statistics arise naturally in producing minimum-variance unbiased estimators. In 1984, Serfling considered the distribution formed by evaluating the kernel of the $U$-statistics and proposed generalized…

统计理论 · 数学 2024-09-13 Li Tuobang

KL Convergence Guarantees for Score diffusion models under minimal data assumptions

Diffusion models are a new class of generative models that revolve around the estimation of the score function associated with a stochastic differential equation. Subsequent to its acquisition, the approximated score function is then…

统计理论 · 数学 2024-09-13 Giovanni Conforti , Alain Durmus , Marta Gentiloni Silveri

Identifiability of Polynomial Models from First Principles and via a Gr\"obner Basis Approach

The relationship between a set of design points and the class of hierarchical polynomial models identifiable from the design is investigated. Saturated models are of particular interest. Necessary and sufficient conditions are derived on…

统计理论 · 数学 2024-09-12 Janet D. Godolphin , James D. E. Grant

Robust estimations from distribution structures: I. Mean

As the most fundamental problem in statistics, robust location estimation has many prominent solutions, such as the trimmed mean, Winsorized mean, Hodges Lehmann estimator, Huber M estimator, and median of means. Recent studies suggest that…

统计理论 · 数学 2024-09-12 Li Tuobang

On the Concentration of the Minimizers of Empirical Risks

Obtaining guarantees on the convergence of the minimizers of empirical risks to the ones of the true risk is a fundamental matter in statistical learning. Instead of deriving guarantees on the usual estimation error, the goal of this paper…

统计理论 · 数学 2024-09-12 Paul Escande

Graphical models for infinite measures with applications to extremes

Conditional independence and graphical models are well studied for probability distributions on product spaces. We propose a new notion of conditional independence for any measure $\Lambda$ on the punctured Euclidean space $\mathbb…

统计理论 · 数学 2024-09-12 Sebastian Engelke , Jevgenijs Ivanovs , Kirstin Strokorb

Cross-validation on Extreme Regions

We conduct a non asymptotic study of the Cross Validation (CV) estimate of the generalization risk for learning algorithms dedicated to extreme regions of the covariates space. In this Extreme Value Analysis context, the risk function…

统计理论 · 数学 2024-09-12 Anass Aghbalou , Patrice Bertail , François Portier , Anne Sabourin

Where does the tail start? Inflection Points and Maximum Curvature as Boundaries

Understanding the tail behavior of distributions is crucial in statistical theory. For instance, the tail of a distribution plays a ubiquitous role in extreme value statistics, where it is associated with the likelihood of extreme events.…

统计理论 · 数学 2024-09-11 Rafael Cabral , Maria de Iorio , Andrea Cremaschi

Many-sample tests for the equality and the proportionality hypotheses between large covariance matrices

This paper proposes procedures for testing the equality hypothesis and the proportionality hypothesis involving a large number of $q$ covariance matrices of dimension $p\times p$. Under a limiting scheme where $p$, $q$ and the sample sizes…

统计理论 · 数学 2024-09-11 Tianxing Mei , Chen Wang , Jianfeng Yao

Robust estimations from distribution structures: III. Invariant Moments

Descriptive statistics for parametric models are currently highly sensative to departures, gross errors, and/or random errors. Here, leveraging the structures of parametric distributions and their central moment kernel distributions, a…

统计理论 · 数学 2024-09-11 Li Tuobang

A statistical framework for analyzing shape in a time series of random geometric objects

We introduce a new framework to analyze shape descriptors that capture the geometric features of an ensemble of point clouds. At the core of our approach is the point of view that the data arises as sampled recordings from a metric…

统计理论 · 数学 2024-09-11 Anne van Delft , Andrew J. Blumberg

Minimax Optimal Algorithms with Fixed-$k$-Nearest Neighbors

This paper presents how to perform minimax optimal classification, regression, and density estimation based on fixed-$k$ nearest neighbor (NN) searches. We consider a distributed learning scenario, in which a massive dataset is split into…

统计理论 · 数学 2024-09-11 J. Jon Ryu , Young-Han Kim

Distribution-free tests of multivariate independence based on center-outward quadrant, Spearman, Kendall, and van der Waerden statistics

Due to the lack of a canonical ordering in ${\mathbb R}^d$ for $d>1$, defining multivariate generalizations of the classical univariate ranks has been a long-standing open problem in statistics. Optimal transport has been shown to offer a…

统计理论 · 数学 2024-09-11 Hongjian Shi , Mathias Drton , Marc Hallin , Fang Han

A Discontinuity Adjustment for Subdistribution Function Confidence Bands Applied to Right-Censored Competing Risks Data (with Erratum)

The wild bootstrap is the resampling method of choice in survival analytic applications. Theoretic justifications rely on the assumption of existing intensity functions which is equivalent to an exclusion of ties among the event times.…

统计理论 · 数学 2024-09-11 Dennis Dobler , Merle Munko

Empirical likelihood for generalized smoothly trimmed mean

This paper introduces a new version of the smoothly trimmed mean with a more general version of weights, which can be used as an alternative to the classical trimmed mean. We derive its asymptotic variance and to further investigate its…

统计理论 · 数学 2024-09-10 Elina Kresse , Emils Silins , Janis Valeinis