Statistics — Scifaro

Reliable model selection in the presence of parameter non-identifiability

Mathematical models are invaluable for understanding and predicting how biological systems behave, although their construction requires specifying mechanisms and relationships that are often not perfectly known. In the presence of multiple…

Methodology · Statistics 2026-05-20 Yong See Foo , Torkel E. Loman , Alexander P. Browning , Ivo Siekmann , Ruth E. Baker , Jennifer A. Flegg

Assessing covariate-adjusted risk differences in small-sample clinical trials

Binary endpoints are common in clinical trials and conditional odds ratios have traditionally been used to assess treatment effects. However, the interpretation of odds ratios is difficult, they are non-collapsible and rely on strong…

Methodology · Statistics 2026-05-20 Martin Schnuerch , Alex Ocampo , Klaus Kähler Holst , Christian Stock

Making Uncertainty Visible: Multiverse Analysis for Robust Computational Social Science

Through case studies, we demonstrate how multiverse analysis can strengthen the robustness and transparency of computational social science findings against alternative methodological decisions. We conduct multiverse analyses of three…

Other Statistics · Statistics 2026-05-20 Maximilian Linde , Jun Sun , Paul Balluff , Danica Radovanović , Chung-hong Chan

Causal treatment effect decompositions with time-to-event outcomes under competing events

Inference about treatment effects for time-to-event outcomes is often obscured by the presence of competing events. A particularly complex situation arises when the treatment influences the occurrence of the competing event. A comprehensive…

Methodology · Statistics 2026-05-20 Mikko Valtanen , Tommi Härkänen , Jenni Lehtisalo , Tiia Ngandu , Miia Kivipelto , Kari Auranen

Probabilistic Multivariate Time Series Forecasting with Diffusion Copulas

Accurately assessing financial risk requires capturing both individual asset volatility and the complex, asymmetric dependence structures that emerge during extreme market events. While modern diffusion-based models have advanced…

Machine Learning · Statistics 2026-05-20 David Huk , Dongshan Wang , Miha Bresar

Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data

Stochastic gradient methods are central to modern large-scale learning, but their use with incomplete covariates remains delicate since imputation schemes generally introduce systematic gradient biases, as shown for linear models. In this…

Machine Learning · Statistics 2026-05-20 Ferdinand Genans , Erwan Scornet

Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation

In this paper, we establish Berry-Esseen-type bounds for federated linear stochastic approximation (LSA). Our results provide the first federated Gaussian approximations for LSA that explicitly capture communication-computation trade-offs…

Machine Learning · Statistics 2026-05-20 Ilya Levin , Maksim Shuklin , Eric Moulines , Paul Mangold , Sergey Samsonov

Posterior Contraction of L\'evy Adaptive B-spline Regression in Besov Spaces

We investigate the asymptotic properties of the L\'evy Adaptive B-spline (LABS) regression model, a Bayesian nonparametric method that incorporates B-spline kernels into the L\'evy Adaptive Regression Kernel (LARK) model. LABS applies…

Machine Learning · Statistics 2026-05-20 Jeunghun Oh , Sewon Park , Jaeyong Lee

Uncertainty-Aware Ideal Point Estimation via Variational EM

Roll-call data analysis aims to estimate legislators' ideal points and quantify the associated uncertainty. Existing approaches either rely on Bayesian methods implemented via Markov chain Monte Carlo sampling or focus primarily on point…

Methodology · Statistics 2026-05-20 Kwangok Seo , Youngjo Lee , Jong Hee Park , Xinlei Wang , Johan Lim

Density-Ratio Losses for Post-Hoc Learning to Defer

We study post-hoc Learning to Defer (L2D) through the lens of ideal distributions: divergence-regularized reweightings of the data distribution under which a model attains low loss. We define deferral via the density-ratio between a model's…

Machine Learning · Statistics 2026-05-20 Alexander Soen , Ragnar Thobaben , Joakim Jaldén , Richard Nock

Inference for Fr\'echet Regression

Linear regression is widely used to model relationships between responses and predictors. In modern applications, one encounters data where the responses are non-Euclidean random objects situated in a metric space, paired with Euclidean…

Methodology · Statistics 2026-05-20 Wookyeong Song , Paromita Dubey , Hans-Georg Müller , Alexander Petersen

Tweedie's Formulae and Diffusion Generative Models Beyond Gaussian

Diffusion models have achieved remarkable success in generating samples from unknown data distributions. Most popular stochastic differential equation-based diffusion models perturb the target distribution by adding Gaussian noise,…

Machine Learning · Statistics 2026-05-20 Wenpin Tang , Nizar Touzi , Zikun Zhang , Xun Yu Zhou

A General Statistical Framework for Hardy-Weinberg Equilibrium Inference on the X Chromosome

Testing for Hardy-Weinberg equilibrium (HWE) is a fundamental component of genetic data analysis, widely used for quality control and model validation. Although HWE testing is well established for autosomal loci, inference on the X…

Applications · Statistics 2026-05-20 Lin Zhang , Andrew Paterson , Lei Sun

A Unified Framework for Structure-Aware Clustering and Heterogeneous Causal Graph Learning

In complex multivariate systems, interactions among variables are defined by dependency structures, often encoded as directed acyclic graphs ($\text{DAGs}$). However, dependency structures can vary across subjects, and ignoring this…

Machine Learning · Statistics 2026-05-20 Honglin Du , Muxuan Liang , Xiang Zhong

Factor Augmented High-Dimensional SGD

Stochastic gradient descent (SGD) is a fundamental optimization algorithm widely used in modern machine learning. In this paper, we propose Factor-Augmented SGD (FSGD), a new optimization method that leverages latent factor representations…

Machine Learning · Statistics 2026-05-20 Shubo Li , Yuefeng Han , Xiufan Yu

Open-Weight LLMs Are Often Competitive with Commercial APIs for Political Science Text Classification

Can researchers use local open-weight models instead of commercial APIs for LLM text classification? Local models avoid marginal API charges, keep data on the researcher's machine, and make exact model versions easier to preserve. I…

Applications · Statistics 2026-05-20 Hanno Hilbig

Ranking with Confidence: A Probabilistic Framework for Deterministic Ranking Methods

Rankings are central to decision-making in fields ranging from education to online platforms, yet classical deterministic methods such as the Borda count method or Copeland-type pairwise methods ignore uncertainty due to sampling noise or…

Methodology · Statistics 2026-05-20 Shunpu Zhang

Precision Physical Activity Prescription via Reinforcement Learning for Functional Actions

Physical activity (PA) plays an important role in maintaining and improving health. Daily steps have been a key PA measure that is easily accessible with common wearable devices. However, methods are lacking to recommend a personalized…

Applications · Statistics 2026-05-20 Gefei Lin , Rui Miao , Jennifer Sacheck , Xiaoke Zhang

The Spatial Cram'{e}r--von Mises Test of Independence under $\beta$-Mixing: Asymptotic Theory and Python Implementation

We derive the asymptotic distribution of the spatial Cram'{e}r--von Mises statistic for testing bivariate independence in stationary random fields on $\mathbb{R}^2$ under polynomial $\beta$-mixing dependence, and document the Python…

Methodology · Statistics 2026-05-20 Marco Mandap

Progression to the mean: A practical Bayesian workflow for the development and deployment of clinical prediction models

Clinical prediction models provide a prediction (e.g., estimated risk) for each individual, typically expressed as a point estimate derived from a deterministic function such as a logistic regression equation. Such 'plug-in' predictions…

Methodology · Statistics 2026-05-20 Mohsen Sadatsafavi , Richard D. Riley