Statistics — Scifaro

PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting

We study the problem of multiclass PAC learning with bandit feedback in the realizable setting. In this framework, there is an unknown data distribution over an instance space $\mathcal{X}$ and a label space $\mathcal{Y}$, as in classical…

Machine Learning · Statistics 2026-05-27 Steve Hanneke , Qinglin Meng , Shay Moran , Amirreza Shaeiri

Provably Data-driven Lagrangian Relaxation for Mixed Integer Linear Programming

Lagrangian Relaxation (LR) is a powerful technique for solving large-scale Mixed Integer Linear Programming (MILP), particularly those with decomposable structures, such as vehicle routing or unit commitment problems. By relaxing the…

Machine Learning · Statistics 2026-05-27 Tung Quoc Le , Anh Tuan Nguyen , Viet Anh Nguyen

Shallow ReLU$^s$ Networks in $L^p$-Type and Sobolev Spaces: Approximation and Path-Norm Controlled Generalization

This paper studies approximation by shallow ReLU$^s$ networks, $\sigma_s(t)=\max\{0,t\}^s$, together with their generalization behavior under $\ell_1$ path-norm control. For the $L^p$-type integral spaces…

Machine Learning · Statistics 2026-05-27 Weizhao Li , Fanghui Liu , Lei Shi

A Unified Theory of Conditional Coverage in Conformal Prediction with Applications

Conformal prediction provides prediction sets with finite-sample marginal coverage, but many applications require coverage guarantees that adapt to individual test points, a subpopulation, or a structural component of the data. Existing…

Methodology · Statistics 2026-05-27 Yinjie Min , Liuhua Peng , Changliang Zou

Jacobian-Velocity Bounds for Deployment Risk Under Covariate Drift

We study long-horizon deployment of a frozen predictor under dynamic covariate shift. A time-domain Poincare inequality first reduces temporal risk volatility to derivative energy. A Jacobian-velocity theorem then supplies the corresponding…

Machine Learning · Statistics 2026-05-27 Jonathan R. Landers

Nonparametric Instrumental Variable Analysis Without Structural Equations: Debiased Inference on Functionals of Inverse Problems with No Solutions

We consider debiased inference on finite-dimensional functionals of infinite-dimensional least-squares solutions to inverse problems as a way to avoid having to assume exact solutions exist. Such assumptions are substantive and not…

Machine Learning · Statistics 2026-05-27 Zikai Shen , Nathan Kallus , Dimitri Meunier , Houssam Zenati , Arthur Gretton , Aurélien Bibaut

Assessing Per-Sample Membership Inference Vulnerability without Retraining

Recent work in the privacy literature shows that sample-targeted membership inference attacks (MIAs) significantly outperform untargeted approaches by a wide margin. Motivated by this observation, we address the following question: can the…

Machine Learning · Statistics 2026-05-27 Valentin Dorseuil , Jamal Atif , Olivier Cappé

Dynamic Financial Analysis (DFA) of General Insurers under Climate Change

Climate change is expected to significantly affect the physical, financial, and economic environments over the long term, posing risks to the financial health of general insurers. While general insurers typically use Dynamic Financial…

Applications · Statistics 2026-05-27 Benjamin Avanzi , Yanfeng Li , Greg Taylor , Bernard Wong

Membership Inference Risks in Quantized Models: A Theoretical and Empirical Study

Quantizing machine learning models has demonstrated its effectiveness in lowering memory and inference costs while maintaining performance levels comparable to those of the original models. In this work, we investigate the impact of…

Machine Learning · Statistics 2026-05-27 Eric Aubinais , Philippe Formont , Pablo Piantanida , Elisabeth Gassiat

Fast Spectrum Estimation of Some Kernel Matrices

In data science, individual observations are often assumed to come independently from an underlying probability space. Kernel matrices formed from large sets of such observations arise frequently, for example during classification tasks. It…

Machine Learning · Statistics 2026-05-27 Mikhail Lepilov

Robust Classification of High-Dimensional Data using Data-Adaptive Energy Distance

Classification of high-dimensional low sample size (HDLSS) data poses a challenge in a variety of real-world situations, such as gene expression studies, cancer research, and medical imaging. This article presents the development and…

Machine Learning · Statistics 2026-05-27 Jyotishka Ray Choudhury , Aytijhya Saha , Sarbojit Roy , Subhajit Dutta

DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking

Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established science. We introduce DiscoverPhysics, an interactive benchmark that asks a LLM agent…

Machine Learning · Statistics 2026-05-26 Matt L. Wiemann , Lindsay M. Smith , Peter Melchior , Siddharth Mishra-Sharma , Andrew Gordon Wilson , Pavel Izmailov , Carolina Cuesta-Lázaro

Quantile autoregressive moving average models for ratio-based bounded time series

This paper proposes the quantile unit-log-symmetric autoregressive moving average (QULS--ARMA) model for bounded time series on the open unit interval $(0,1)$. The model extends the unit-log-symmetric family by introducing a quantile-based…

Computation · Statistics 2026-05-26 Helton Saulo , Roberto Vila , Filidor Vilca

Considering causality in the construction of molecular signatures of lifestyle exposures

Molecular signatures derived from omics data are increasingly used in epidemiological studies to characterize lifestyle exposures, either as proxies of exposure or to provide insight into disease mechanisms. These signatures are typically…

Methodology · Statistics 2026-05-26 Diana Wu , Vivian Viallon

Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance

Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite…

Machine Learning · Statistics 2026-05-26 Jose Blanchet , Peter Glynn , Wenhao Yang

Weighted NPMLE for the Marginal Mean of Recurrent Events with a Competing Terminal Event

Regression modeling of recurrent and terminal events continues to present methodological challenges in survival analysis. Existing approaches either make unverifiable assumptions about the dependency structure between the two event types or…

Methodology · Statistics 2026-05-26 Anna Bellach , Michael R. Kosorok

Nonparametric Estimation via Expected Order Statistics

The empirical distribution function assigns mass $1/n$ to each of the $n$ observations in a sample. As these are highly variable, estimation error may be reduced by replacing them with estimated observations that are asymptotically less…

Methodology · Statistics 2026-05-26 Tommaso Lando , Lorenzo Tedesco

Bayesian perspectives on exponential random graph models

Exponential random graph models (ERGMs) are a widely used framework for network data, enabling hypothesis testing on the structural mechanisms underlying observed networks. Bayesian ERGMs provide principled uncertainty quantification and…

Methodology · Statistics 2026-05-26 Alberto Caimo , Isabella Gollini

High-Dimensional Change-Point Detection via Angular Kernel Statistics

We study change-point detection for high-dimensional data in regimes where inference must be performed from small batches of observations. Our primary focus is the high-dimensional, low sample size (HDLSS) regime, where the sequence length…

Methodology · Statistics 2026-05-26 Jyotishka Ray Choudhury , Yao Xie

Geometry Adaptive Counterfactual Distribution Learning with Diffusion-Guided Smoothing

We study counterfactual distribution learning for high-dimensional outcomes whose counterfactual law may concentrate near lower-dimensional structure. Standard isotropic smoothing treats all ambient directions equally, leading to…

Methodology · Statistics 2026-05-26 Kwangho Kim