Statistics — Scifaro

Sequential generalized kernel equating: Providing comparable scores across multiple test forms with nonequivalent groups and differently measured covariates

Test equating using covariates may be applied to provide comparable scores from multiple test forms when no anchor items are available. However, its performance may be compromised if some of the covariates themselves are measured using…

Methodology · Statistics 2026-05-28 Michaela Vařejková , Patrícia Martinková , Eva Potužníková

Conservative neural posterior estimation via distributionally robust training

Simulation-based inference with neural posterior estimation (NPE) often yields overconfident and unreliable posteriors under limited simulation budgets. To address this, we propose DRO-NPE, a distributionally robust approach that replaces…

Machine Learning · Statistics 2026-05-28 William Laplante , Yuga Hikida , Charita Dellaporta , François-Xavier Briol , Ayush Bharti

The Modified Egger Intercept Tests for Detecting Horizontal Pleiotropy in Two-Sample Summary-Data Mendelian Randomization

The Egger intercept (EI) test is a widely used tool to detect horizontal pleiotropy in two-sample summary-data Mendelian randomization. A significant EI test suggests that either the average pleiotropic effect differs from zero (i.e.,…

Methodology · Statistics 2026-05-28 Yilei Ma , Youpeng Su , Xin Liu , Xuanye Cui , Ping Yin , Peng Wang

Variance-Adaptive Optimal Algorithm for Reinforcement Learning with Multinomial Logit Function Approximation

Reinforcement learning with multinomial logistic (MNL) function approximation has become an important framework due to its flexibility and broad applicability. While existing studies have established regret guarantees under worst-case…

Machine Learning · Statistics 2026-05-28 Wonyoung Kim , Min-Hwan Oh , Garud Iyengar , Assaf Zeevi

Capturing the Curve: Functional Data Analysis for Validated Digital Outcome Measures

Digital health technologies enable high-frequency collection of data in near-continuous time and capture rich information about the health of individuals. The raw data collected by these devices often have a hierarchical functional…

Applications · Statistics 2026-05-28 Mia S. Tackney , Marcos Matabuena , Marco Palma , Michael Wester , Claire Maassen , Thomas Krammer , Julian Mustroph , Peter H. Charlton , James Carpenter , Sofia S. Villar

Decision-focused learning for optimal PV-Battery scheduling

The use of residential photovoltaics has increased dramatically in recent years. With battery systems becoming more affordable, the optimal operation of a photovoltaic-battery system can bring significant savings to households. Optimal…

Machine Learning · Statistics 2026-05-28 Joris Depoortere , Hussain Kazmi , Johan Driesen

Counterfactually Fair Regression via Optimal Transport

We consider the problem of learning a counterfactually fair regressor. We adopt a causal uncertainty view in which counterfactual fairness is defined with resampled noise. We focus on obtaining theoretical fairness guarantees for a new…

Machine Learning · Statistics 2026-05-28 M. Generali Lince , S. Gaucher , J-J. Vie , P. Loiseau

Geometry of Relaxed Fair Regression: A Unified Framework for Aware and Unaware Settings

Fairness-accuracy trade-offs are a central concern in the deployment of fairness-aware machine learning methods. When sensitive attributes are unavailable at inference time-the so called unawareness setting, principled methods for obtaining…

Machine Learning · Statistics 2026-05-28 M. Generali Lince , V. Divol , R. Flamary , S. Gaucher , P. Loiseau

How to measure intra-physician variability in clinical decision-making?

Intra-physician prescribing variability, the probability that one physician issues discordant decisions for two patients deemed comparable on observed covariates, holds great impact in quality of care, safety and cost. However, there are no…

Applications · Statistics 2026-05-28 Alaedine Benani , Pierre Meneton , Emmanuel Messas , Liza Hettal , Sai Sagireddy , Damien Grosgeorge , Jérôme Salomon , Sylvain Bodard , Xavier Tannier

Identifying Direct Causal Effects in Latent Factor Models by Accounting for Unidentified Parents

We consider linear structural equation models with explicitly modelled latent variables. In such models, observed and latent variables solve linear equations including stochastic noise terms. The goal of our work is to identify the direct…

Methodology · Statistics 2026-05-28 Tom Hochsprung , Nils Sturma , Jakob Runge , Mathias Drton , Andreas Gerhardus

A computationally-tractable measure of global sensitivity for sampling-based Bayesian inference

Bayesian inference can often be sensitive to the choice of hyperparameters of the prior or likelihood, yet defining and quantifying this sensitivity in a principled and computationally feasible way remains challenging in practice.…

Methodology · Statistics 2026-05-28 Arina Odnoblyudova , Charita Dellaporta , François-Xavier Briol

The conditional-mean barrier: From deterministic regression to conditional distribution learning

Many problems in computational science and engineering become one-to-many after coarse graining, partial observation, or inverse reconstruction: a resolved state may not determine a unique subgrid forcing, a structural descriptor may not…

Machine Learning · Statistics 2026-05-28 Junfeng Chen

Deep Neural Network Training as Random Effects: An Optimization-Inference Duality

Deep neural networks (DNNs) have achieved remarkable empirical success, yet their training dynamics remain understood mainly from optimization rather than statistical principles. Here we develop a statistical framework for DNN training in…

Machine Learning · Statistics 2026-05-28 Minhao Yao , Ruoyu Wang , Xihong Lin , Lin Liu , Zhonghua Liu

Multi-Teacher Knowledge Distillation via Teacher-Informed Mixture Priors

Knowledge distillation is a powerful method for model compression, enabling the efficient deployment of complex deep learning models (teachers), including large language models. However, its underlying statistical mechanisms remain unclear,…

Methodology · Statistics 2026-05-28 Luyang Fang , Yongkai Chen , Jiazhang Cai , Ping Ma , Wenxuan Zhong

Is Backpropagation Optimal? When Synthetic Gradients Improve Sample Efficiency

Backpropagation is the default learning rule for artificial neural networks and is often treated as the settled approach whenever differentiability is available. In this work, we revisit this convention through a theoretical lens of sample…

Machine Learning · Statistics 2026-05-28 Yibo Jacky Zhang , Zeyu Tang , Sanmi Koyejo

A Bayesian Hierarchical Generalization of Empirical Bayes for Crash Rate Estimation with Missing Traffic Volume

The Empirical Bayes (EB) procedure of Hauer et al. (2002) is the workhorse of highway safety analysis: it combines a Safety Performance Function with observed crash counts to produce shrinkage estimates of segment-level crash rates. EB…

Applications · Statistics 2026-05-28 Lars Skaug

A Parameterization-Invariant DIC

The classic Deviance Information Criterion (DIC) is not invariant to reparameterization and can have a negative and unstable effective number of parameters. The reason for the effective number of parameters being negative is actually that…

Methodology · Statistics 2026-05-28 Xingyao Xiao , Sophia Rabe-Hesketh

Learning to target with network interference

This paper studies adaptive targeting under network interference in a bandit setting, where treatments applied to one individual may affect others through spillover effects. We consider a linear model in a sparse regime, where each…

Machine Learning · Statistics 2026-05-28 Xiaomeng Wang , Hamsa Bastani , Osbert Bastani , Zhimei Ren

Day-Ahead Electricity Price Forecasting Using a Multivariate Group Lasso Method

Electricity price signals in modern power systems exhibit complex dependence structures that render forecasting inherently challenging. Our analysis of real-world pricing signals from the California Independent System Operator (CAISO)…

Applications · Statistics 2026-05-28 Keyi Wang , Jiaxiang Ji , Mahan Mansouri , Ahmed Aziz Ezzat

Soft Specialists: $\alpha$-R\'enyi Ensembles for Uncertainty-Aware LLM Post-Training

Existing training approaches for large language models learn a single set of parameters, based on large volumes of data, which is typically heterogeneous, conflicting and often outright contradictory. As a result, the model is forced to…

Machine Learning · Statistics 2026-05-28 Paula Cordero-Encinar , Georgy Tyukin , Andrew B. Duncan