Statistics — Scifaro

A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning

Contrastive Representation Learning (CRL) has achieved strong empirical success in multiple machine learning disciplines, yet its theoretical sample complexity remains poorly understood. Existing analyses usually assume that input tuples…

Machine Learning · Statistics 2026-05-29 Nong Minh Hieu , Antoine Ledent

Fast and accurate conditioning for large-scale and online Gaussian process prediction problems

Gaussian Process (GP) models provide a flexible framework for prediction and uncertainty quantification. For most covariance functions, however, exact GP prediction with $n$ points scales as $\mathcal{O}(n^3)$, making it prohibitively…

Computation · Statistics 2026-05-29 Samanyu Arora , Christopher J. Geoga

Trust Me, I'm a Doctor?

Clinical trials usually target average treatment effects, but treatment decisions are made for individuals. This tension motivates a common criticism of evidence-based medicine: a treatment that is beneficial on average may be inappropriate…

Applications · Statistics 2026-05-29 Zach Shahn , Mats Stensrud

Estimating Continuous Treatment Effects with Two-Stage Kernel Ridge Regression

We study the problem of estimating the effect function for a continuous treatment, which maps each treatment value to a population-averaged outcome. A central challenge in this setting is confounding: treatment assignment often depends on…

Methodology · Statistics 2026-05-29 Seok-Jin Kim , Kaizheng Wang

Adaptive Learning via Off-Model Training and Importance Sampling for Fully Non-Markovian Optimal Stochastic Control. Complete version

This paper studies continuous-time stochastic control problems whose controlled states are fully non-Markovian and depend on unknown model parameters. Such problems arise naturally in path-dependent stochastic differential equations,…

Machine Learning · Statistics 2026-05-29 Dorival Leão , Alberto Ohashi , Simone Scotti , Adolfo M. D da Silva

Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer

A learning-to-defer (L2D) system decides, for each input, whether to predict on its own or to hand it to one of several available experts. The very well established recipe trains classifier and router jointly by treating the $K$ classes and…

Machine Learning · Statistics 2026-05-29 Yannis Montreuil , Axel Carlier , Lai Xing Ng , Wei Tsang Ooi

MEC: Machine-Learning-Assisted Generalized Entropy Calibration for Semi-Supervised Mean Estimation

Obtaining high-quality labels is costly, whereas unlabeled covariates are often abundant, motivating semi-supervised inference methods with reliable uncertainty quantification. Prediction-powered inference (PPI) leverages a machine-learning…

Machine Learning · Statistics 2026-05-29 Se Yoon Lee , Jae Kwang Kim

Measure flow path recovery in Bayes Hilbert spaces

We study the ill-posed problem of recovering a probability measure flow from finitely many moving localized sensors using a Bayes Hilbert framework. Relative to a fixed reference probability measure, a probability law is represented by its…

Machine Learning · Statistics 2026-05-29 S. David Mis , Maarten V. de Hoop

Estimating within-cluster and between-cluster spillover effects in randomized saturation designs

Randomized saturation designs are two-stage experiments: they first randomly assign treatment probabilities over the clusters and then randomly assign the treatment to the units within the clusters. The existing literature on randomized…

Methodology · Statistics 2026-05-29 Sizhu Lu , Lei Shi , Peng Ding

Benchmarking Formula 1 results using a normal model

There is enduring interest in disentangling the effects of skill and luck in sport. A key issue in Formula 1 is distinguishing between car-level and driver-level effects. Four elite teams currently dominate Formula 1 and have won every…

Applications · Statistics 2026-05-29 John Fry , Silvio Fanzon , Mark Austin , Tom Brighton

Learning-to-Defer with Expert-Conditional Advice

Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert,…

Machine Learning · Statistics 2026-05-29 Yannis Montreuil , Leïna Montreuil , Axel Carlier , Lai Xing Ng , Wei Tsang Ooi

Aggregate Models, Not Explanations: Improving Feature Importance Estimation

Feature-importance methods show promise in transforming machine learning models from predictive engines into tools for scientific discovery. However, due to data sampling and algorithmic stochasticity, expressive models can be unstable,…

Machine Learning · Statistics 2026-05-29 Joseph Paillard , Angel Reyero Lobo , Denis A. Engemann , Bertrand Thirion

Diffusion differentiable resampling

This paper is concerned with differentiable resampling in the context of sequential Monte Carlo (e.g., particle filtering). Drawing on reparametrisation, we propose a new resampling method that is informative and instantly differentiable,…

Machine Learning · Statistics 2026-05-29 Jennifer Rosina Andersson , Zheng Zhao

Assessing Extrapolation of Peaks Over Thresholds with Martingale Testing

We present the winning strategy for the EVA2025 Data Challenge, which aimed to estimate the probability of extreme precipitation events. These events occurred at most once in the dataset making the challenge fundamentally one of…

Methodology · Statistics 2026-05-29 Joseph de Vilmarest , Olivier Wintenberger

BITS for GAPS: Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates

We introduce Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates (BITS for GAPS), a framework enabling information-theoretic experimental design of Gaussian process-based surrogate models. Unlike standard…

Machine Learning · Statistics 2026-05-29 Kyla D. Jones , Alexander W. Dowling

Scalable and Communication-Efficient Varying Coefficient Mixed Effect Models: Methodology, Theory, and Applications

Human migration exhibits complex spatiotemporal dependence driven by environmental and socioeconomic forces. Modeling such patterns at scale requires methods that accommodate many random effects while remaining feasible when raw data or…

Methodology · Statistics 2026-05-29 Lida Chalangar Jalili Dehkharghani , Li-Hsiang Lin

TabMGP: Martingale Posterior with TabPFN

Bayesian inference provides principled uncertainty quantification but is often limited by the challenges of prior and likelihood elicitation. The martingale posterior (MGP) (Fong et al., 2023) offers an alternative by replacing these…

Methodology · Statistics 2026-05-29 Kenyon Ng , Edwin Fong , David T. Frazier , Jeremias Knoblauch , Susan Wei

Follow-the-Perturbed-Leader for Decoupled Bandits: Best-of-Both-Worlds and Practicality

We study the decoupled multi-armed bandit problem, where the learner separately selects one arm for exploration and one, possibly different, arm for exploitation at each round. In this setting, the loss of the explored arm is observed but…

Machine Learning · Statistics 2026-05-29 Chaiwon Kim , Jongyeong Lee , Min-hwan Oh

Adversarial Robustness in One-Stage Learning-to-Defer

Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs either to a predictor or to external experts. While promising, L2D is highly vulnerable to adversarial perturbations, which can not only flip predictions but also…

Machine Learning · Statistics 2026-05-29 Yannis Montreuil , Letian Yu , Axel Carlier , Lai Xing Ng , Wei Tsang Ooi

Calibrating Generative Models to Distributional Constraints

Generative models frequently suffer miscalibration, wherein statistics of the sampling distribution, such as the fraction of generations in a given class, deviate from desired values. We frame calibration as a constrained optimization…

Machine Learning · Statistics 2026-05-29 Henry D. Smith , Nathaniel L. Diamant , Brian L. Trippe