Statistics — Scifaro

Improved Guarantees for Heterogeneous Treatment-Effect Estimation via Matrix Completion

A central goal of modern causal inference is estimating heterogeneous treatment effects to answer questions like "how does an intervention affect each unit," rather than only on average. We study this problem with panel-data where we…

Machine Learning · Statistics 2026-05-29 Anay Mehrotra , Phuc Tran , Van H. Vu , Manolis Zampetakis

Leave a Window Out: Modifying the Jackknife for Predictive Inference in Time Series

Conformal prediction methods enjoy strong theoretical and empirical predictive inference performance, provided the data is exchangeable, and predictors are trained in a memoryless fashion. However, these assumptions and constraints are…

Machine Learning · Statistics 2026-05-29 Hanyang Jiang , Rina Foygel Barber , Ashwin Pananjady , Yao Xie

MoSAIC: Multi-Resolution Spatial Regression Analysis of Cellular Colocalizations in Cancer Imaging

Hierarchical multiplex imaging approaches generate spatially resolved single-cell measurements across multiple, spatially organized fields of view (FOVs) within patient tumor specimens, thereby enabling systematic investigation of how the…

Methodology · Statistics 2026-05-29 Jessica Aldous , Michele Peruzzi , Maria Masotti , Aaron Udager , Allison May , Evan Keller , Veerabhadran Baladandayuthapani

Wasserstein Contraction of Coordinate Ascent Variational Inference

We study the contraction in Wasserstein distance of the coordinate ascent variational inference algorithm. This is shown to hold under a transport-information inequality at the fixed points and a functional smoothness condition. The results…

Machine Learning · Statistics 2026-05-29 Rocco Caprio , Adrien Corenflos , Sam Power

Multi-source land-use emissions reveal rising airborne fraction

The airborne fraction is the share of anthropogenic carbon dioxide emissions that remains in the atmosphere and is a key indicator of carbon-cycle response and remaining carbon budgets under continued emissions. Whether this share is rising…

Applications · Statistics 2026-05-29 J. Eduardo Vera-Valdés

Cellwise Robust Discriminant Analysis

Classical discriminant analysis (DA) is based on the mean and empirical covariance matrix of each class, both of which are sensitive to outliers in the data. In the past the focus was on casewise outliers, that is, datapoints that lie far…

Methodology · Statistics 2026-05-29 Fabio Centofanti , Can Hakan Dagidir , Mia Hubert , Peter J. Rousseeuw

Visual Spatial Learning: Single-Field Spatial Interpolation Using Convolutional Neural Networks

Predicting a complete spatially correlated field from sparse observations is a fundamental challenge in spatial statistics and environmental modelling. Classical interpolation methods such as Kriging rely on Gaussian process assumptions and…

Machine Learning · Statistics 2026-05-29 Daniel Tinoco , Raquel Menezes , Carlos Baquero , Alexandra Silva

High-Dimensional Data with Measurement Error

In many important statistical analyses, the number of covariates $p$ often exceeds the data size $n$, a regime commonly referred to as high-dimensional. While considerable progress has been made in high-dimensional regression under the…

Methodology · Statistics 2026-05-29 Herman Tesso , Georges Nguefack-Tsague

Leveraging Large Language Models to Improve Precision in Randomized Controlled Trials

Large language models (LLMs) are increasingly used in statistical research and applications. However,they are also notorious for unreliable or biased information. Here, we explore whether LLMs can be used to improve the precision of…

Applications · Statistics 2026-05-29 Jaylin Lowe , Adam Sales , Johann A. Gagnon-Bartsch

Diffusion Models Are Statistically Optimal for Learning Low-Dimensional Multi-Modal Distributions

Score-based diffusion models have demonstrated remarkable empirical success in learning high-dimensional distributions, particularly those exhibiting low-dimensional and multi-modal structures. However, theoretical understanding of their…

Machine Learning · Statistics 2026-05-29 Jingda Wu , Changxiao Cai

Credible rectangles for high-dimensional posterior comparison

We propose a Bayesian framework for uncertainty quantification and comparison in brain connectivity graph analysis. Standard graph-based approaches typically rely on point estimates of correlation matrices, overlooking the uncertainty…

Methodology · Statistics 2026-05-29 Alice Chevaux , Julyan Arbel , Guillaume Kon Kam King , Sophie Achard

Constructing Contact and Connectivity Matrices for Infectious Disease Modelling

Contact (or mixing, or more generally connectivity) matrices are a fundamental component of modelling and inference for infectious disease epidemiology. Their structure and parametrisation directly accounts for the frequency of interactions…

Applications · Statistics 2026-05-29 Xiahui Li , Dongni Zhang , Neha Bansal , Jessica R. E. Bridgen , Chris Jewell , Emma McBryde , Glenn Marion , Emily Nixon , Philip D. O'Neill , David J. Pascall , Lorenzo Pellis , Simon E. F. Spencer , Panayiota Touloupou , Lloyd Chapman , Ben Swallow

Identification-Robust Testing in Endogenous Functional Linear Regression with Weak or Irrelevant Auxiliary Variables

We develop dimension-reduction-free tests for the slope function in functional linear regression when the functional regressor may be endogenous or measured with error. The tests are based on a functional moment condition induced by an…

Methodology · Statistics 2026-05-29 Won-Ki Seo

Modifying causal models to distinguish between transient and lasting causal effects

This paper considers how to classify the effects of interventions in causal models for outcomes and exposures observed over time. First, we demonstrate the limitations of the most common uses of potential outcomes and causal directed…

Methodology · Statistics 2026-05-29 Russell Steele , Naftali Weinberger , Tess Baker , Ian Shrier

Statistical Tapers for Correlation-Based Localization in Ensemble Data Assimilation

Localization is essential in ensemble-based data assimilation because finite ensembles produce noisy covariance estimates, causing spurious updates and excessive loss of ensemble variance. In subsurface applications, localization is usually…

Methodology · Statistics 2026-05-29 Alexandre A. Emerick , Vinicius Luiz Santos Silva

Joint Model and Data Sparsification via the Marginal Likelihood

Sparse recovery in linear systems underpins applications from signal processing to high-dimensional regression. Sparse Bayesian Learning, grounded in the principle of automatic relevance determination (ARD), offers a practical Bayesian…

Machine Learning · Statistics 2026-05-29 Alexander Timans , Thomas Möllenhoff , Christian A. Naesseth , Mohammad Emtiyaz Khan , Eric Nalisnick

Fisher's ideas and the design of field experiments in agronomy and plant breeding

R. A. Fisher was one of the greatest scientists of the last century. He made many ground-breaking contributions, so many indeed that it seems almost impossible to list all of them. His revolutionary contributions to the design of…

Methodology · Statistics 2026-05-29 Hans-Peter Piepho

Instance-dependent Stochastic Lipschitz bandit

We study the Lipschitz bandit problem, where a learner sequentially maximizes an unknown Lipschitz function $f$ over a domain $\mathcal{X} \subset [0,1]^d$ using noisy pointwise evaluations. Existing regret bounds are either worst-case,…

Machine Learning · Statistics 2026-05-29 Marius Potfer , Vianney Perchet

A Jensen-Shannon divergence based $k$--$NN$ algorithm for missing value imputation in compositional data

A novel nonparametric method to impute missing values in compositional data is developed. The method is based on the $k$--$NN$ algorithm, utilizes the Jensen-Shannon divergence and employs the Fr{\'e}chet mean to allow for more flexibility…

Methodology · Statistics 2026-05-29 Michail Tsagris , Connie Stewart , Abdulaziz Alenazi

Eigen-Spike Emergence and Quadratic Equivalents for Conjugate Kernels on Nonlinearly Separable Data

Recent work in random matrix theory (RMT) has developed the notion of deterministic equivalents: typically linear surrogate models that approximate the spectral behavior of large nonlinear random matrices, such as nonlinear feature maps in…

Machine Learning · Statistics 2026-05-29 Collin Cranston , Zhichao Wang , Todd Kemp , Michael W. Mahoney