Statistics — Scifaro

Improved Guarantees for Heterogeneous Treatment-Effect Estimation via Matrix Completion

A central goal of modern causal inference is estimating heterogeneous treatment effects to answer questions like "how does an intervention affect each unit," rather than only on average. We study this problem with panel-data where we…

Machine Learning · Statistics 2026-05-29 Anay Mehrotra , Phuc Tran , Van H. Vu , Manolis Zampetakis

Leave a Window Out: Modifying the Jackknife for Predictive Inference in Time Series

Conformal prediction methods enjoy strong theoretical and empirical predictive inference performance, provided the data is exchangeable, and predictors are trained in a memoryless fashion. However, these assumptions and constraints are…

Machine Learning · Statistics 2026-05-29 Hanyang Jiang , Rina Foygel Barber , Ashwin Pananjady , Yao Xie

modelimportance: An R package for evaluating model importance within a multi-model ensemble

Ensemble forecasts are commonly used to support decision-making and policy planning across various fields because they often offer improved accuracy and stability compared to individual models. As each model has its own unique…

Computation · Statistics 2026-05-29 Minsu Kim , Li Shandross , Evan L. Ray , Nicholas G. Reich

Wasserstein Contraction of Coordinate Ascent Variational Inference

We study the contraction in Wasserstein distance of the coordinate ascent variational inference algorithm. This is shown to hold under a transport-information inequality at the fixed points and a functional smoothness condition. The results…

Machine Learning · Statistics 2026-05-29 Rocco Caprio , Adrien Corenflos , Sam Power

Visual Spatial Learning: Single-Field Spatial Interpolation Using Convolutional Neural Networks

Predicting a complete spatially correlated field from sparse observations is a fundamental challenge in spatial statistics and environmental modelling. Classical interpolation methods such as Kriging rely on Gaussian process assumptions and…

Machine Learning · Statistics 2026-05-29 Daniel Tinoco , Raquel Menezes , Carlos Baquero , Alexandra Silva

Diffusion Models Are Statistically Optimal for Learning Low-Dimensional Multi-Modal Distributions

Score-based diffusion models have demonstrated remarkable empirical success in learning high-dimensional distributions, particularly those exhibiting low-dimensional and multi-modal structures. However, theoretical understanding of their…

Machine Learning · Statistics 2026-05-29 Jingda Wu , Changxiao Cai

Accurate and Efficient MCMC for Latent Position Models

Latent position models (LPMs) are a large and popular class of models for random graphs. However, fitting Bayesian LPMs is computationally challenging - computing the likelihood even once takes time that is quadratic in the number of…

Computation · Statistics 2026-05-29 Zonghao Li , Aaron Smith

Joint Model and Data Sparsification via the Marginal Likelihood

Sparse recovery in linear systems underpins applications from signal processing to high-dimensional regression. Sparse Bayesian Learning, grounded in the principle of automatic relevance determination (ARD), offers a practical Bayesian…

Machine Learning · Statistics 2026-05-29 Alexander Timans , Thomas Möllenhoff , Christian A. Naesseth , Mohammad Emtiyaz Khan , Eric Nalisnick

Instance-dependent Stochastic Lipschitz bandit

We study the Lipschitz bandit problem, where a learner sequentially maximizes an unknown Lipschitz function $f$ over a domain $\mathcal{X} \subset [0,1]^d$ using noisy pointwise evaluations. Existing regret bounds are either worst-case,…

Machine Learning · Statistics 2026-05-29 Marius Potfer , Vianney Perchet

Eigen-Spike Emergence and Quadratic Equivalents for Conjugate Kernels on Nonlinearly Separable Data

Recent work in random matrix theory (RMT) has developed the notion of deterministic equivalents: typically linear surrogate models that approximate the spectral behavior of large nonlinear random matrices, such as nonlinear feature maps in…

Machine Learning · Statistics 2026-05-29 Collin Cranston , Zhichao Wang , Todd Kemp , Michael W. Mahoney

Matching Rates and Optimal Allocation for Federated Probe-Logit Distillation under Heterogeneous Bandwidth Budgets

In federated language modeling, $K$ nodes each hold $n$ samples but cannot pool data or exchange full-precision gradients or weights. We study the minimax rate at which a conditional distribution over $V$ tokens can be estimated when each…

Machine Learning · Statistics 2026-05-29 Prasanjit Dubey , Xiaoming Huo

`pandemonium`: High Dimensional Analysis in Linked Spaces

A common challenge in data analysis is uncovering relationships between predictors and responses in problems involving large numbers of both. When the number of predictors and responses is limited, visual approaches are particularly…

Computation · Statistics 2026-05-29 Gabriel McCoy , German Valencia , Ursula Laa

Deep Optimal Individualized Treatment Rules for Bivariate Survival Outcomes via Adaptive Prediction-Powered Learning

In randomized trials involving multiple treatments, bivariate survival outcomes present significant analytical challenges for making decisions. This paper addresses the problem of deriving optimal individualized treatment rules to maximize…

Machine Learning · Statistics 2026-05-29 Kun Ren , Yifan Cui , Wen Su

Prediction-Powered Inference Across Many Tasks for AI Evaluation & Social Science Research

Many applications require statistically valid inference across many related tasks, while using only a handful of high-quality labels per hypothesis. In AI evaluation, these tasks may correspond to model behaviors across prompts, subgroups,…

Machine Learning · Statistics 2026-05-29 Nicolas Emmenegger , Ellery Stahler , Chara Podimata

Neural Posterior Estimation for Spatial Individual-Level Epidemic Models

Spatial individual-level models (ILMs) provide a flexible framework for modelling infectious disease transmission across populations with known locations. Bayesian inference for these models relies on Markov chain Monte Carlo (MCMC), which…

Computation · Statistics 2026-05-29 Yicheng Mao , Rob Deardon

Anytime-Valid Federated Conformal RAG for LLM Swarms

Federated Conformal RAG (FC-RAG) provides distribution-free coverage for a bandwidth-limited swarm of weak language models, but only at a fixed horizon. We extend it to anytime-valid sequential coverage: validity at every stopping time,…

Machine Learning · Statistics 2026-05-29 Prasanjit Dubey , Xiaoming Huo

Dynamics of Stochastic Momentum with Sparse Updates in High Dimensions

Existing theory of momentum assumes that gradients arrive at every parameter at a roughly constant rate, an assumption violated in practice by heavy-tailed data distributions and modern architectures. We theoretically analyze the dynamics…

Machine Learning · Statistics 2026-05-29 Katie Everett , Elliot Paquette

Bridging Maximum Likelihood and Optimal Transport for Efficient Inference and Model Selection in Stochastic Block Models

We study inference in stochastic block models (SBMs) through the lens of optimal transport (OT). We first establish that maximum likelihood variational inference (MLVI) can be interpreted as a semi-relaxed Gromov-Wasserstein (srGW)…

Machine Learning · Statistics 2026-05-29 Simon Queric , Cédric Vincent-Cuaz , Charles Bouveyron , Marco Corneli

Insurance Pricing Optimization via Off-Policy Evaluation

Traditional insurance pricing relies on risk-based principles that ensure actuarial fairness and solvency but do not explicitly account for policyholders' price sensitivity. We formulate insurance pricing as a decision-making problem and…

Machine Learning · Statistics 2026-05-29 Sascha Günther , Dimitri Semenovich , Mario V. Wüthrich

Triangular-Reference Schr\"odinger Bridges for Time Series Generation

We introduce Triangular-Reference Schr\"odinger Bridges for Time Series (TR-SBTS), a conservative extension of the SBTS framework in which the Brownian reference is replaced by an intervalwise frozen, possibly degenerate diffusion…

Machine Learning · Statistics 2026-05-29 Gabriele Bocchi