Statistics — Scifaro

Rao-Blackwellized Score Matching on Manifolds

We study denoising score matching (DSM) when the latent distribution is supported on a smooth embedded manifold $M \subset \mathbb{R}^D$. Under ambient Gaussian corruption, the tangent denoising target contains a singular normal-fiber noise…

Machine Learning · Statistics 2026-05-28 Divit Rawal

Generalized Stochastic Approximation of the Log-Likelihood Ratio for Robust Sequential Change-Point Detection

Sequential change-point detection in non-Gaussian stochastic processes is challenging because the underlying densities are rarely known in real time. Classical parametric procedures such as CUSUM lose optimality under distributional…

Methodology · Statistics 2026-05-28 Serhii Zabolotnii

Federated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coverage

Training a language model on data scattered across bandwidth-limited nodes that cannot be centralized is a setting that arises in clinical networks, enterprise knowledge bases, and scientific consortia. We study the regime in which data…

Machine Learning · Statistics 2026-05-28 Prasanjit Dubey , Xiaoming Huo

Invariant quantile regression for heterogeneous environments

In this paper, we propose an invariant quantile regression (IQR) framework specifically designed for multi-environment datasets, which captures the invariance across different environments. This framework is closely related to transfer…

Methodology · Statistics 2026-05-28 Bo Fu , Dandan Jiang

No Certificate for Alignment: Two Independent Impossibilities and the Pareto Frontier of Achievable Safety Guarantees

We argue that formal certification of AI alignment over open-ended or unbounded input domains is impossible under standard assumptions in computational complexity and learning theory, and characterise what remains achievable. Two…

Machine Learning · Statistics 2026-05-28 Ayushi Agarwal

A Unified Framework for Density Estimation under Right-Censored Point-Centred Quarter Sampling

While the point-centred quarter method (PCQM) is widely used for density estimation, existing methods for handling right-censored data from truncated search radii rely primarily on a Poisson model assuming complete spatial randomness (CSR),…

Methodology · Statistics 2026-05-28 Wenzhe Huang , Guochun Shen , Dingliang Xing , Jiangyan Zhao

Moment Matters: Mean and Variance Causal Graph Discovery from Heteroscedastic Observational Data

Heteroscedasticity -- where the variance of a variable changes with other variables -- is pervasive in real data, and elucidating why it arises from the perspective of statistical moments is crucial in scientific knowledge discovery and…

Machine Learning · Statistics 2026-05-28 Yoichi Chikahara

The Well-Tempered Classifier: Some Elementary Properties of Temperature Scaling

Temperature scaling is a simple method that allows to control the uncertainty of probabilistic models. It is mostly used in two contexts: improving the calibration of classifiers and tuning the stochasticity of large language models (LLMs).…

Machine Learning · Statistics 2026-05-28 Pierre-Alexandre Mattei , Bruno Loureiro

Corrected Samplers for Discrete Flow Models

Discrete flow models (DFMs) have been proposed to learn the data distribution on finite state space, offering a flexible framework as an alternative to discrete diffusion models. A line of recent work has studied samplers for discrete…

Machine Learning · Statistics 2026-05-28 Zhengyan Wan , Yidong Ouyang , Liyan Xie , Hongyuan Zha , Fang Fang , Guang Cheng

MitoFREQ: A Novel Approach for Mitogenome Frequency Estimation from Top-level Haplogroups and Single Nucleotide Variants

Lineage marker population frequencies can serve as one way to express evidential value in forensic genetics. However, for high-quality whole mitochondrial DNA genome sequences (mitogenomes), population data remain limited. In this paper, we…

Applications · Statistics 2026-05-28 Mikkel Meyer Andersen , Nicole Huber , Kimberly S Andreaggi , Tóra Oluffa Stenberg Olsen , Walther Parson , Charla Marshall

Cauchy-Gaussian Overbound for Heavy-tailed GNSS Measurement Errors

Overbounds of heavy-tailed measurement errors are essential to meet stringent navigation requirements in integrity monitoring applications. This paper proposes to leverage the bounding sharpness of the Cauchy distribution in the core and…

Applications · Statistics 2026-05-28 Zhengdao Li , Penggao Yan , Weisong Wen , Li-Ta Hsu

DAISI: Data Assimilation with Inverse Sampling using Stochastic Interpolants

Data assimilation (DA) is a cornerstone of scientific and engineering applications, combining model forecasts with sparse and noisy observations to estimate latent system states. Classical high-dimensional DA methods, such as the ensemble…

Machine Learning · Statistics 2026-05-28 Martin Andrae , Erik Wikingsson , So Takao , Tomas Landelius , Fredrik Lindsten

Linear Causal Representation Learning by Topological Ordering, Pruning, and Disentanglement

Causal representation learning (CRL) has garnered increasing interest from the causal inference and artificial intelligence communities due to its potential to disentangle complex data-generating mechanism into causally interpretable latent…

Machine Learning · Statistics 2026-05-28 Hao Chen , Lin Liu , Yu Guang Wang

Rescuing double robustness: safe estimation under complete misspecification

Double robustness is a major selling point of semiparametric and missing data methodology. Its virtues lie in protection against partial nuisance misspecification and asymptotic semiparametric efficiency under correct nuisance…

Methodology · Statistics 2026-05-28 Lorenzo Testa , Francesca Chiaromonte , Kathryn Roeder

Bayesian optimal experimental design with Wasserstein information criteria

Bayesian optimal experimental design (OED) provides a principled framework for selecting observations or experiments. We introduce new Bayesian design criteria based on the expected Wasserstein-$p$ distance between the prior and posterior…

Methodology · Statistics 2026-05-28 Tapio Helin , Youssef Marzouk , Jose Rodrigo Rojo-Garcia

Bayesian Latent Class Regression with Interpretable Binary Profiles

High-dimensional categorical data arise in diverse scientific domains and are often accompanied by covariates. Latent class regression models are routinely used in such settings, reducing dimensionality by assuming conditional independence…

Methodology · Statistics 2026-05-28 Yuren Zhou , Yuqi Gu , David B. Dunson

A Bayesian Nonparametric Perspective on Mahalanobis Distance for Out of Distribution Detection

Bayesian nonparametric methods are naturally suited to the problem of out-of-distribution (OOD) detection. However, these techniques have largely been eschewed in favor of simpler methods based on distances between pre-trained or learned…

Machine Learning · Statistics 2026-05-28 Randolph W. Linderman , Noah Cowan , Yiran Chen , Scott W. Linderman

Isometry pursuit

Isometry pursuit is a convex algorithm for identifying orthonormal column-submatrices of wide matrices. It consists of a novel normalization method followed by multitask basis pursuit. Applied to Jacobians of putative coordinate functions,…

Machine Learning · Statistics 2026-05-28 Samson Koelle , Marina Meila

Conformal Prediction for Hierarchical Data

We consider conformal prediction for multivariate data and focus on hierarchical data, where some components are linear combinations of others. Intuitively, the hierarchical structure can be leveraged to reduce the size of prediction…

Machine Learning · Statistics 2026-05-28 Guillaume Principato , Gilles Stoltz , Yvenn Amara-Ouali , Yannig Goude , Bachir Hamrouche , Jean-Michel Poggi

Learning with Importance Weighted Variational Inference

Several variational bounds involving importance weighting ideas generalize the Evidence Lower BOund (ELBO) for marginal likelihood optimization, such as the Importance-weighted Auto-Encoder (IWAE), Variational R\'enyi (VR) and VR-IWAE…

Machine Learning · Statistics 2026-05-28 Kamélia Daudel , François Roueff