Statistics — Scifaro

Permutation-Invariant Spectral Learning via Dyson Diffusion

Diffusion models are central to generative modeling and have been adapted to graphs by diffusing adjacency matrix representations. The challenge of having up to $n!$ such representations for graphs with $n$ nodes is only partially mitigated…

Machine Learning · Statistics 2026-05-29 Tassilo Schwarz , Cai Dieball , Constantin Kogler , Renaud Lambiotte , Arnaud Doucet , Aljaž Godec , George Deligiannidis

SpeedCP: Fast Kernel-based Conditional Conformal Prediction

Conformal prediction provides distribution-free prediction sets with finite-sample conditional guarantees. We build upon the RKHS-based framework of Gibbs et al. (2023), which leverages families of covariate shifts to provide approximate…

Methodology · Statistics 2026-05-29 Yating Liu , Yeo Jin Jung , Zixuan Wu , So Won Jeong , Claire Donnat

Optimal Stopping for Sequential Bayesian Experimental Design

Sequential Bayesian experimental design typically assumes that the number of experiments is fixed before data collection begins. In practical campaigns, however, experimentation may need to terminate early because additional measurements…

Methodology · Statistics 2026-05-29 Chen Cheng , Xun Huan

SADA: Safe and Adaptive Aggregation of Multiple Black-Box Predictions in Semi-Supervised Learning

Semi-supervised learning (SSL) arises in practice when labeled data are scarce or expensive to obtain, while large quantities of unlabeled data are readily available. With the growing adoption of machine learning techniques, it has become…

Machine Learning · Statistics 2026-05-29 Jiawei Shan , Zhifeng Chen , Yiming Dong , Yazhen Wang , Jiwei Zhao

Risk-averse Fair Multi-class Classification

We develop a new classification framework based on the theory of coherent risk measures and systemic risk. The proposed approach is suitable for multi-class problems when the data is noisy, scarce (relative to the dimension of the problem),…

Machine Learning · Statistics 2026-05-29 Darinka Dentcheva , Xiangyu Tian

From Sublinear to Linear: Local Convergence in Finite-Width Networks via Locally Polyak-Lojasiewicz Regions

We study local linear convergence of gradient descent for finite-width feedforward networks under the squared empirical loss. Prior work shows that GD can remain confined to a Locally Quasi-Convex Region (LQCR) around initialization, but…

Machine Learning · Statistics 2026-05-29 Agnideep Aich , Ashit Baran Aich , Bruce Wade

Position: Stop Chasing the C-index when Evaluating Survival Analysis Models

The current state of evaluation in survival analysis is plagued by the persistent use of evaluation metrics in ways that are misaligned with the stated modeling objective. In addition, many such evaluations are based on censoring…

Methodology · Statistics 2026-05-29 Christian Marius Lillelund , Shi-ang Qi , Russell Greiner , Christian Fischer Pedersen

rd2d: Causal Inference in Boundary Discontinuity Designs

Boundary Discontinuity (BD) designs are used in empirical research to learn about causal treatment effects along a continuous assignment boundary defined by a bivariate score. These designs are also known as multi-score regression…

Methodology · Statistics 2026-05-29 Matias D. Cattaneo , Rocio Titiunik , Ruiqi Rae Yu

Distributed Generalized Linear Models: A Privacy-Preserving Approach

This paper presents a novel approach to classical linear regression, enabling model computation from data streams or in a distributed setting while preserving data privacy in federated environments. We extend this framework to generalized…

Computation · Statistics 2026-05-29 Daniel Tinoco , Raquel Menezes , Carlos Baquero

Invariant Image Reparameterisation: Bridging Symbolic and Numerical Methods for Identifiability Analysis, Model Reduction, and Prediction

Structural and practical parameter non-identifiability issues are common when mathematical models are used to interpret data. Such issues motivate model reparameterisation and reduction methods. Here, we consider Invariant Image…

Applications · Statistics 2026-05-29 Oliver J. Maclaren , Ruanui Nicholson , Joel A. Trent , Joshua Rottenberry , Matthew Simpson

Noise-Aware Differentially Private Variational Inference

Differential privacy (DP) provides robust privacy guarantees for statistical inference, but this can lead to unreliable results and biases in downstream applications. While several noise-aware approaches have been proposed which integrate…

Machine Learning · Statistics 2026-05-29 Talal Alrawajfeh , Joonas Jälkö , Antti Honkela

Robust Principal Components by Casewise and Cellwise Weighting

Principal component analysis (PCA) is a fundamental tool for analyzing multivariate data. Here the focus is on dimension reduction to the principal subspace, characterized by its projection matrix. The classical principal subspace can be…

Methodology · Statistics 2026-05-29 Fabio Centofanti , Mia Hubert , Peter J. Rousseeuw

Bayesian Structured Mediation Analysis With Unobserved Confounders

We explore methods to reduce the impact of unobserved confounders on the causal mediation analysis of high-dimensional mediators with spatially smooth structures, such as brain imaging data. The key approach is to incorporate the latent…

Methodology · Statistics 2026-05-29 Yuliang Xu , Shu Yang , Jian Kang

Bayesian modeling of multi-species labeling errors in ecological studies

Ecological and conservation studies monitoring bird communities typically rely on species classification based on bird vocalizations. Historically, this has been based on expert volunteers going into the field and making lists of the bird…

Methodology · Statistics 2026-05-29 Haoxuan Wang , Patrik Lauha , David B. Dunson

Parametric Bootstrap for Fixed Edge-Probability Network Models

This paper studies parametric bootstrap methods for network data, with the goal of quantifying the uncertainty of network statistics of interest. While existing network resampling methods primarily focus on count statistics under…

Methodology · Statistics 2026-05-29 Zhixuan Shao , Can M. Le

Second-level global sensitivity analysis of numerical simulators with application to an accident scenario in a sodium-cooled fast reactor

Numerical simulators are widely used to model physical phenomena and global sensitivity analysis (GSA) aims at studying the global impact of the input uncertainties on the simulator output. To perform GSA, statistical tools based on…

Methodology · Statistics 2026-05-29 Anouar Meynaoui , Amandine Marrel , Béatrice Laurent

Microcanonical Hamiltonian Monte Carlo

We develop Microcanonical Hamiltonian Monte Carlo (MCHMC), a class of models which follow a fixed energy Hamiltonian dynamics, in contrast to Hamiltonian Monte Carlo (HMC), which follows canonical distribution with different energy levels.…

Computation · Statistics 2026-05-29 Jakob Robnik , G. Bruno De Luca , Eva Silverstein , Uroš Seljak

Beyond Exchangeability: Distribution-Shift-Aware Integration of External Control Data in Randomized Trials

Randomized controlled trials (RCTs) are the gold standard for evaluating causal effects but are often costly and difficult to scale; consequently, they are frequently augmented with auxiliary external controls in many applications. Prior…

Methodology · Statistics 2026-05-28 Jiawei Shan , Yiteng Tu , Guanbo Wang , Chao Ying , Jiwei Zhao

Beyond Lipschitz: Data-Driven Robustness via Discrete Modulus of Continuity

Robustness of neural networks is commonly quantified via local or global Lipschitz constants. However, Lipschitz continuity can be overly coarse or overly restrictive as global robustness measure, failing to capture nuanced, data-dependent…

Machine Learning · Statistics 2026-05-28 Jürgen Dölz , Michael Multerer , Michele Palma

Adaptive clinical trials based on design-optimal e-values with automatic curtailment: An application to single-arm trials with binary data

The e-value is gaining traction as a robust alternative to p-values and Bayes factors for quantifying statistical evidence. e-values are a promising method for adaptive clinical trials due to their anytime-validity: e-values ensure type I…

Methodology · Statistics 2026-05-28 Stef Baas , Judith ter Schure , Joost van Rosmalen