统计理论 — Scifaro

Finite sample rates for logistic regression with small noise or few samples

The logistic regression estimator is known to inflate the magnitude of its coefficients if the sample size $n$ is small, the dimension $p$ is (moderately) large or the signal-to-noise ratio $1/\sigma$ is large (probabilities of observing a…

统计理论 · 数学 2024-03-01 Felix Kuchelmeister , Sara van de Geer

Non-asymptotic analysis of Langevin-type Monte Carlo algorithms

We study Langevin-type algorithms for sampling from Gibbs distributions such that the potentials are dissipative and their weak gradients have finite moduli of continuity not necessarily convergent to zero. Our main result is a…

统计理论 · 数学 2024-03-01 Shogo Nakakita

Inference via robust optimal transportation: theory and methods

Optimal transportation theory and the related $p$-Wasserstein distance ($W_p$, $p\geq 1$) are widely-applied in statistics and machine learning. In spite of their popularity, inference based on these tools has some issues. For instance, it…

统计理论 · 数学 2024-03-01 Yiming Ma , Hang Liu , Davide La Vecchia , Metthieu Lerasle

High-dimensional Asymptotics of Langevin Dynamics in Spiked Matrix Models

We study Langevin dynamics for recovering the planted signal in the spiked matrix model. We provide a "path-wise" characterization of the overlap between the output of the Langevin algorithm and the planted signal. This overlap is…

统计理论 · 数学 2024-03-01 Tengyuan Liang , Subhabrata Sen , Pragya Sur

Computation-information gap in high-dimensional clustering

We investigate the existence of a fundamental computation-information gap for the problem of clustering a mixture of isotropic Gaussian in the high-dimensional regime, where the ambient dimension $p$ is larger than the number $n$ of points.…

统计理论 · 数学 2024-02-29 Bertrand Even , Christophe Giraud , Nicolas Verzelen

Nonparametric Measure-Transportation-Based Methods for Directional Data

This paper proposes various nonparametric tools based on measure transportation for directional data. We use optimal transports to define new notions of distribution and quantile functions on the hypersphere, with meaningful quantile…

统计理论 · 数学 2024-02-29 Marc Hallin , Hang Liu , Thomas Verdebout

Generalized Fisher-Darmois-Koopman-Pitman Theorem and Rao-Blackwell Type Estimators for Power-Law Distributions

This paper generalizes the notion of sufficiency for estimation problems beyond maximum likelihood. In particular, we consider estimation problems based on Jones et al. and Basu et al. likelihood functions that are popular among…

统计理论 · 数学 2024-02-29 Atin Gayen , M. Ashok Kumar

Advancing Continuous Distribution Generation: An Exponentiated Odds Ratio Generator Approach

This paper presents a new methodology for generating continuous statistical distributions, integrating the exponentiated odds ratio within the framework of survival analysis. This new method enhances the flexibility and adaptability of…

统计理论 · 数学 2024-02-28 Xinyu Chen , Yuanqi Xie , Achraf Cohen , Shusen Pu

Stochastic approximation in infinite dimensions

Stochastic Approximation (SA) was introduced in the early 1950's and has been an active area of research for several decades. While the initial focus was on statistical questions, it was seen to have applications to signal processing,…

统计理论 · 数学 2024-02-28 Rajeeva Laxman Karandikar , Bhamidi V Rao

How to Measure Evidence and Its Strength: Bayes Factors or Relative Belief Ratios?

Both the Bayes factor and the relative belief ratio satisfy the principle of evidence and so can be seen to be valid measures of statistical evidence. Certainly Bayes factors are regularly employed. The question then is: which of these…

统计理论 · 数学 2024-02-28 Luai Al-Labadi , Ayman Alzaatreh , Michael Evans

A penalized criterion for selecting the number of clusters for K-medians

Clustering is a usual unsupervised machine learning technique for grouping the data points into groups based upon similar features. We focus here on unsupervised clustering for contaminated data, i.e in the case where K-medians should be…

统计理论 · 数学 2024-02-28 Antoine Godichon-Baggioni , Sobihan Surendran

Failures and Successes of Cross-Validation for Early-Stopped Gradient Descent

We analyze the statistical properties of generalized cross-validation (GCV) and leave-one-out cross-validation (LOOCV) applied to early-stopped gradient descent (GD) in high-dimensional least squares regression. We prove that GCV is…

统计理论 · 数学 2024-02-27 Pratik Patil , Yuchen Wu , Ryan J. Tibshirani

Debiased LASSO under Poisson-Gauss Model

Quantifying uncertainty in high-dimensional sparse linear regression is a fundamental task in statistics that arises in various applications. One of the most successful methods for quantifying uncertainty is the debiased LASSO, which has a…

统计理论 · 数学 2024-02-27 Pedro Abdalla , Gil Kur

A kernel-based analysis of Laplacian Eigenmaps

Given i.i.d. observations uniformly distributed on a closed manifold $\mathcal{M}\subseteq \mathbb{R}^p$, we study the spectral properties of the associated empirical graph Laplacian based on a Gaussian kernel. Our main results are…

统计理论 · 数学 2024-02-27 Martin Wahl

Bayesian nonparametric statistics, St-Flour lecture notes

These are lecture notes of the 51st Saint-Flour summer school, July 2023, on the topic of Bayesian nonparametric statistics

统计理论 · 数学 2024-02-27 Ismaël Castillo

Statistical Games

This work contains the mathematical exploration of a few prototypical games in which central concepts from statistics and probability theory naturally emerge. The first two kinds of games are termed Fisher and Bayesian games, which are…

统计理论 · 数学 2024-02-27 Jozsef Konczer

Local moment matching with Erlang mixtures under automatic roughness penalization

We consider the class of Erlang mixtures for the task of density estimation on the positive real line when the only available information is given as local moments, a histogram with potentially higher order moments in some bins. By…

统计理论 · 数学 2024-02-27 Oskar Laverny , Philippe Lambert

Transfer Learning with Large-Scale Quantile Regression

Quantile regression is increasingly encountered in modern big data applications due to its robustness and flexibility. We consider the scenario of learning the conditional quantiles of a specific target population when the available data…

统计理论 · 数学 2024-02-27 Jun Jin , Jun Yan , Robert H. Aseltine , Kun Chen

Dimension-independent Markov chain Monte Carlo on the sphere

We consider Bayesian analysis on high-dimensional spheres with angular central Gaussian priors. These priors model antipodally symmetric directional data, are easily defined in Hilbert spaces and occur, for instance, in Bayesian binary…

统计理论 · 数学 2024-02-27 H. C. Lie , D. Rudolf , B. Sprungk , T. J. Sullivan

Estimation of multivariate generalized gamma convolutions through Laguerre expansions

The generalized gamma convolutions class of distributions appeared in Thorin's work while looking for the infinite divisibility of the log-Normal and Pareto distributions. Although these distributions have been extensively studied in the…

统计理论 · 数学 2024-02-27 Oskar Laverny , Esterina Masiello , Véronique Maume-Deschamps , Didier Rullière