Statistics — Scifaro

Microcanonical Hamiltonian Monte Carlo

We develop Microcanonical Hamiltonian Monte Carlo (MCHMC), a class of models which follow a fixed energy Hamiltonian dynamics, in contrast to Hamiltonian Monte Carlo (HMC), which follows canonical distribution with different energy levels.…

Computation · Statistics 2026-05-29 Jakob Robnik , G. Bruno De Luca , Eva Silverstein , Uroš Seljak

Beyond Exchangeability: Distribution-Shift-Aware Integration of External Control Data in Randomized Trials

Randomized controlled trials (RCTs) are the gold standard for evaluating causal effects but are often costly and difficult to scale; consequently, they are frequently augmented with auxiliary external controls in many applications. Prior…

Methodology · Statistics 2026-05-28 Jiawei Shan , Yiteng Tu , Guanbo Wang , Chao Ying , Jiwei Zhao

Beyond Lipschitz: Data-Driven Robustness via Discrete Modulus of Continuity

Robustness of neural networks is commonly quantified via local or global Lipschitz constants. However, Lipschitz continuity can be overly coarse or overly restrictive as global robustness measure, failing to capture nuanced, data-dependent…

Machine Learning · Statistics 2026-05-28 Jürgen Dölz , Michael Multerer , Michele Palma

Adaptive clinical trials based on design-optimal e-values with automatic curtailment: An application to single-arm trials with binary data

The e-value is gaining traction as a robust alternative to p-values and Bayes factors for quantifying statistical evidence. e-values are a promising method for adaptive clinical trials due to their anytime-validity: e-values ensure type I…

Methodology · Statistics 2026-05-28 Stef Baas , Judith ter Schure , Joost van Rosmalen

Sequential generalized kernel equating: Providing comparable scores across multiple test forms with nonequivalent groups and differently measured covariates

Test equating using covariates may be applied to provide comparable scores from multiple test forms when no anchor items are available. However, its performance may be compromised if some of the covariates themselves are measured using…

Methodology · Statistics 2026-05-28 Michaela Vařejková , Patrícia Martinková , Eva Potužníková

Conservative neural posterior estimation via distributionally robust training

Simulation-based inference with neural posterior estimation (NPE) often yields overconfident and unreliable posteriors under limited simulation budgets. To address this, we propose DRO-NPE, a distributionally robust approach that replaces…

Machine Learning · Statistics 2026-05-28 William Laplante , Yuga Hikida , Charita Dellaporta , François-Xavier Briol , Ayush Bharti

The Modified Egger Intercept Tests for Detecting Horizontal Pleiotropy in Two-Sample Summary-Data Mendelian Randomization

The Egger intercept (EI) test is a widely used tool to detect horizontal pleiotropy in two-sample summary-data Mendelian randomization. A significant EI test suggests that either the average pleiotropic effect differs from zero (i.e.,…

Methodology · Statistics 2026-05-28 Yilei Ma , Youpeng Su , Xin Liu , Xuanye Cui , Ping Yin , Peng Wang

Variance-Adaptive Optimal Algorithm for Reinforcement Learning with Multinomial Logit Function Approximation

Reinforcement learning with multinomial logistic (MNL) function approximation has become an important framework due to its flexibility and broad applicability. While existing studies have established regret guarantees under worst-case…

Machine Learning · Statistics 2026-05-28 Wonyoung Kim , Min-Hwan Oh , Garud Iyengar , Assaf Zeevi

Capturing the Curve: Functional Data Analysis for Validated Digital Outcome Measures

Digital health technologies enable high-frequency collection of data in near-continuous time and capture rich information about the health of individuals. The raw data collected by these devices often have a hierarchical functional…

Applications · Statistics 2026-05-28 Mia S. Tackney , Marcos Matabuena , Marco Palma , Michael Wester , Claire Maassen , Thomas Krammer , Julian Mustroph , Peter H. Charlton , James Carpenter , Sofia S. Villar

Decision-focused learning for optimal PV-Battery scheduling

The use of residential photovoltaics has increased dramatically in recent years. With battery systems becoming more affordable, the optimal operation of a photovoltaic-battery system can bring significant savings to households. Optimal…

Machine Learning · Statistics 2026-05-28 Joris Depoortere , Hussain Kazmi , Johan Driesen

Counterfactually Fair Regression via Optimal Transport

We consider the problem of learning a counterfactually fair regressor. We adopt a causal uncertainty view in which counterfactual fairness is defined with resampled noise. We focus on obtaining theoretical fairness guarantees for a new…

Machine Learning · Statistics 2026-05-28 M. Generali Lince , S. Gaucher , J-J. Vie , P. Loiseau

Geometry of Relaxed Fair Regression: A Unified Framework for Aware and Unaware Settings

Fairness-accuracy trade-offs are a central concern in the deployment of fairness-aware machine learning methods. When sensitive attributes are unavailable at inference time-the so called unawareness setting, principled methods for obtaining…

Machine Learning · Statistics 2026-05-28 M. Generali Lince , V. Divol , R. Flamary , S. Gaucher , P. Loiseau

How to measure intra-physician variability in clinical decision-making?

Intra-physician prescribing variability, the probability that one physician issues discordant decisions for two patients deemed comparable on observed covariates, holds great impact in quality of care, safety and cost. However, there are no…

Applications · Statistics 2026-05-28 Alaedine Benani , Pierre Meneton , Emmanuel Messas , Liza Hettal , Sai Sagireddy , Damien Grosgeorge , Jérôme Salomon , Sylvain Bodard , Xavier Tannier

Identifying Direct Causal Effects in Latent Factor Models by Accounting for Unidentified Parents

We consider linear structural equation models with explicitly modelled latent variables. In such models, observed and latent variables solve linear equations including stochastic noise terms. The goal of our work is to identify the direct…

Methodology · Statistics 2026-05-28 Tom Hochsprung , Nils Sturma , Jakob Runge , Mathias Drton , Andreas Gerhardus

A computationally-tractable measure of global sensitivity for sampling-based Bayesian inference

Bayesian inference can often be sensitive to the choice of hyperparameters of the prior or likelihood, yet defining and quantifying this sensitivity in a principled and computationally feasible way remains challenging in practice.…

Methodology · Statistics 2026-05-28 Arina Odnoblyudova , Charita Dellaporta , François-Xavier Briol

The conditional-mean barrier: From deterministic regression to conditional distribution learning

Many problems in computational science and engineering become one-to-many after coarse graining, partial observation, or inverse reconstruction: a resolved state may not determine a unique subgrid forcing, a structural descriptor may not…

Machine Learning · Statistics 2026-05-28 Junfeng Chen

Deep Neural Network Training as Random Effects: An Optimization-Inference Duality

Deep neural networks (DNNs) have achieved remarkable empirical success, yet their training dynamics remain understood mainly from optimization rather than statistical principles. Here we develop a statistical framework for DNN training in…

Machine Learning · Statistics 2026-05-28 Minhao Yao , Ruoyu Wang , Xihong Lin , Lin Liu , Zhonghua Liu

Multi-Teacher Knowledge Distillation via Teacher-Informed Mixture Priors

Knowledge distillation is a powerful method for model compression, enabling the efficient deployment of complex deep learning models (teachers), including large language models. However, its underlying statistical mechanisms remain unclear,…

Methodology · Statistics 2026-05-28 Luyang Fang , Yongkai Chen , Jiazhang Cai , Ping Ma , Wenxuan Zhong

Is Backpropagation Optimal? When Synthetic Gradients Improve Sample Efficiency

Backpropagation is the default learning rule for artificial neural networks and is often treated as the settled approach whenever differentiability is available. In this work, we revisit this convention through a theoretical lens of sample…

Machine Learning · Statistics 2026-05-28 Yibo Jacky Zhang , Zeyu Tang , Sanmi Koyejo

A Bayesian Hierarchical Generalization of Empirical Bayes for Crash Rate Estimation with Missing Traffic Volume

The Empirical Bayes (EB) procedure of Hauer et al. (2002) is the workhorse of highway safety analysis: it combines a Safety Performance Function with observed crash counts to produce shrinkage estimates of segment-level crash rates. EB…

Applications · Statistics 2026-05-28 Lars Skaug