Statistics — Scifaro

Diffusion Models Are Statistically Optimal for Learning Low-Dimensional Multi-Modal Distributions

Score-based diffusion models have demonstrated remarkable empirical success in learning high-dimensional distributions, particularly those exhibiting low-dimensional and multi-modal structures. However, theoretical understanding of their…

Machine Learning · Statistics 2026-05-29 Jingda Wu , Changxiao Cai

Accurate and Efficient MCMC for Latent Position Models

Latent position models (LPMs) are a large and popular class of models for random graphs. However, fitting Bayesian LPMs is computationally challenging - computing the likelihood even once takes time that is quadratic in the number of…

Computation · Statistics 2026-05-29 Zonghao Li , Aaron Smith

Credible rectangles for high-dimensional posterior comparison

We propose a Bayesian framework for uncertainty quantification and comparison in brain connectivity graph analysis. Standard graph-based approaches typically rely on point estimates of correlation matrices, overlooking the uncertainty…

Methodology · Statistics 2026-05-29 Alice Chevaux , Julyan Arbel , Guillaume Kon Kam King , Sophie Achard

Constructing Contact and Connectivity Matrices for Infectious Disease Modelling

Contact (or mixing, or more generally connectivity) matrices are a fundamental component of modelling and inference for infectious disease epidemiology. Their structure and parametrisation directly accounts for the frequency of interactions…

Applications · Statistics 2026-05-29 Xiahui Li , Dongni Zhang , Neha Bansal , Jessica R. E. Bridgen , Chris Jewell , Emma McBryde , Glenn Marion , Emily Nixon , Philip D. O'Neill , David J. Pascall , Lorenzo Pellis , Simon E. F. Spencer , Panayiota Touloupou , Lloyd Chapman , Ben Swallow

Identification-Robust Testing in Endogenous Functional Linear Regression with Weak or Irrelevant Auxiliary Variables

We develop dimension-reduction-free tests for the slope function in functional linear regression when the functional regressor may be endogenous or measured with error. The tests are based on a functional moment condition induced by an…

Methodology · Statistics 2026-05-29 Won-Ki Seo

Modifying causal models to distinguish between transient and lasting causal effects

This paper considers how to classify the effects of interventions in causal models for outcomes and exposures observed over time. First, we demonstrate the limitations of the most common uses of potential outcomes and causal directed…

Methodology · Statistics 2026-05-29 Russell Steele , Naftali Weinberger , Tess Baker , Ian Shrier

Statistical Tapers for Correlation-Based Localization in Ensemble Data Assimilation

Localization is essential in ensemble-based data assimilation because finite ensembles produce noisy covariance estimates, causing spurious updates and excessive loss of ensemble variance. In subsurface applications, localization is usually…

Methodology · Statistics 2026-05-29 Alexandre A. Emerick , Vinicius Luiz Santos Silva

Joint Model and Data Sparsification via the Marginal Likelihood

Sparse recovery in linear systems underpins applications from signal processing to high-dimensional regression. Sparse Bayesian Learning, grounded in the principle of automatic relevance determination (ARD), offers a practical Bayesian…

Machine Learning · Statistics 2026-05-29 Alexander Timans , Thomas Möllenhoff , Christian A. Naesseth , Mohammad Emtiyaz Khan , Eric Nalisnick

Fisher's ideas and the design of field experiments in agronomy and plant breeding

R. A. Fisher was one of the greatest scientists of the last century. He made many ground-breaking contributions, so many indeed that it seems almost impossible to list all of them. His revolutionary contributions to the design of…

Methodology · Statistics 2026-05-29 Hans-Peter Piepho

Instance-dependent Stochastic Lipschitz bandit

We study the Lipschitz bandit problem, where a learner sequentially maximizes an unknown Lipschitz function $f$ over a domain $\mathcal{X} \subset [0,1]^d$ using noisy pointwise evaluations. Existing regret bounds are either worst-case,…

Machine Learning · Statistics 2026-05-29 Marius Potfer , Vianney Perchet

A Jensen-Shannon divergence based $k$--$NN$ algorithm for missing value imputation in compositional data

A novel nonparametric method to impute missing values in compositional data is developed. The method is based on the $k$--$NN$ algorithm, utilizes the Jensen-Shannon divergence and employs the Fr{\'e}chet mean to allow for more flexibility…

Methodology · Statistics 2026-05-29 Michail Tsagris , Connie Stewart , Abdulaziz Alenazi

Eigen-Spike Emergence and Quadratic Equivalents for Conjugate Kernels on Nonlinearly Separable Data

Recent work in random matrix theory (RMT) has developed the notion of deterministic equivalents: typically linear surrogate models that approximate the spectral behavior of large nonlinear random matrices, such as nonlinear feature maps in…

Machine Learning · Statistics 2026-05-29 Collin Cranston , Zhichao Wang , Todd Kemp , Michael W. Mahoney

Matching Rates and Optimal Allocation for Federated Probe-Logit Distillation under Heterogeneous Bandwidth Budgets

In federated language modeling, $K$ nodes each hold $n$ samples but cannot pool data or exchange full-precision gradients or weights. We study the minimax rate at which a conditional distribution over $V$ tokens can be estimated when each…

Machine Learning · Statistics 2026-05-29 Prasanjit Dubey , Xiaoming Huo

Experimentation for Different Scheduling Policies on Queues: Mixed Differences-in-Q Estimators Based on Little's Law

In data centers, tasks are dispatched to various servers to evenly distribute the workload. When a data center considers implementing a new scheduling algorithm, it typically conducts an A/B test prior to deployment to assess the real-world…

Methodology · Statistics 2026-05-29 Nanshan Jia , Ramesh Johari , Nian Si , Zeyu Zheng

Hierarchical forecasting: The role of information

In hierarchical forecasting, the process of forecast reconciliation transforms a set of "base" or "raw" forecasts, which do not satisfy the hierarchical aggregation constraints in the real data, into a set of "coherent" forecasts, which do…

Methodology · Statistics 2026-05-29 Minh Nguyen , Farshid Vahid , Shanika L Wickramasuriya

Learning study similarity to investigate heterogeneity in meta-analysis using LLMs and triplet loss

Meta-analyses of observational studies often show substantial between-study heterogeneity, limiting the interpretability of pooled estimates. Meta-regression can be used to explore heterogeneity, but it is often underpowered to handle…

Methodology · Statistics 2026-05-29 Kanella Panagiotopoulou , Theodoros Evrenoglou

Change-point estimation for Weibull time series with copula-based Markov models

We study offline change-point estimation for time series data exhibiting nonlinear serial dependence. To address this problem, we propose a copula-based Markov chain model with Weibull marginal distributions, which is suitable for modeling…

Methodology · Statistics 2026-05-29 Li-Hsien Sun , Zong-Yuan Huang , Yi-Ling Huang , Chi-Yang Chiu , Ning Ning

Active learning strategy for excursion-set confidence regions of functional simulator outputs

Estimating excursion set confidence regions seeks to identify regions where a function may exceed some threshold with a given confidence level. This paper focuses on estimating such confidence regions in cases where the function has random…

Methodology · Statistics 2026-05-29 Lucas Brunel , Mathieu Balesdent , Loïc Brevault , Rodolphe Le Riche , Bruno Sudret

`pandemonium`: High Dimensional Analysis in Linked Spaces

A common challenge in data analysis is uncovering relationships between predictors and responses in problems involving large numbers of both. When the number of predictors and responses is limited, visual approaches are particularly…

Computation · Statistics 2026-05-29 Gabriel McCoy , German Valencia , Ursula Laa

Deep Optimal Individualized Treatment Rules for Bivariate Survival Outcomes via Adaptive Prediction-Powered Learning

In randomized trials involving multiple treatments, bivariate survival outcomes present significant analytical challenges for making decisions. This paper addresses the problem of deriving optimal individualized treatment rules to maximize…

Machine Learning · Statistics 2026-05-29 Kun Ren , Yifan Cui , Wen Su