Statistics — Scifaro

Approximating full conformal prediction: distribution free guarantees via the tournament correction

Conformal prediction is a framework for providing prediction intervals with distribution-free validity, guaranteeing predictive coverage for data drawn from any distribution. Its two main variants are full conformal prediction and split…

Methodology · Statistics 2026-05-29 Aabesh Bhattacharyya , Boxuan Zhang , Rina Foygel Barber

Coating Breakdown Prediction for Ships and Inspection Planning

Marine corrosion significantly reduces a ship's availability, increases costs of operation and could impact safety. Protective coatings mitigate these risks, but their effectiveness deteriorates over time. Early detection of coating…

Applications · Statistics 2026-05-29 Huy Truong-Ba , Michael E. Cholette , Geoffrey Will , Marc Hartmann

Bayesian reversal of the liquid level trajectory in a draining tank for pollution forensics

Storage tanks for hazardous liquids are common in industry and agriculture. During a pollution incident, liquid may drain from a storage tank through a small hole, crack, or pipe. After containing the leak, estimating the discharged volume…

Applications · Statistics 2026-05-29 Kyla D. Jones , Gbenga Fabusola , Alexander W. Dowling , Cory M. Simon

A Latent Variable Model for Response Times with Individual-Specific Change-Points

Response times collected in computerised assessments provide information about the underlying response process and may exhibit within-person variation over the course of a test. We propose a latent variable model for log response times that…

Methodology · Statistics 2026-05-29 Gabriel Wallin , Nivedita Bhaktha

Neural Posterior Estimation for Spatial Individual-Level Epidemic Models

Spatial individual-level models (ILMs) provide a flexible framework for modelling infectious disease transmission across populations with known locations. Bayesian inference for these models relies on Markov chain Monte Carlo (MCMC), which…

Computation · Statistics 2026-05-29 Yicheng Mao , Rob Deardon

Anytime-Valid Federated Conformal RAG for LLM Swarms

Federated Conformal RAG (FC-RAG) provides distribution-free coverage for a bandwidth-limited swarm of weak language models, but only at a fixed horizon. We extend it to anytime-valid sequential coverage: validity at every stopping time,…

Machine Learning · Statistics 2026-05-29 Prasanjit Dubey , Xiaoming Huo

Efficient First-Order Methods for Estimating Generalized Additive Index Models

Generalized additive index models (GAIMs) offer a flexible semiparametric framework for capturing complex data relationships, balancing the interpretability of parametric models with the flexibility of nonparametric approaches. However,…

Methodology · Statistics 2026-05-29 Ziyu Peng , Linglingzhi Zhu , Yao Xie

Bayesian Inference of Mixing and Transmission Heterogeneity in Stratified Disease Surveillance Models

When surveillance data of infectious disease incidence (e.g. weekly case counts) are disaggregated by demographic indicators, disparities in long-run health outcomes between these groups become apparent. Accurate identification of high-risk…

Methodology · Statistics 2026-05-29 Miles Moran , Rob Trangucci , Lisa Madsen

Dynamics of Stochastic Momentum with Sparse Updates in High Dimensions

Existing theory of momentum assumes that gradients arrive at every parameter at a roughly constant rate, an assumption violated in practice by heavy-tailed data distributions and modern architectures. We theoretically analyze the dynamics…

Machine Learning · Statistics 2026-05-29 Katie Everett , Elliot Paquette

Bridging Maximum Likelihood and Optimal Transport for Efficient Inference and Model Selection in Stochastic Block Models

We study inference in stochastic block models (SBMs) through the lens of optimal transport (OT). We first establish that maximum likelihood variational inference (MLVI) can be interpreted as a semi-relaxed Gromov-Wasserstein (srGW)…

Machine Learning · Statistics 2026-05-29 Simon Queric , Cédric Vincent-Cuaz , Charles Bouveyron , Marco Corneli

Identification and Inference for Structural Accelerated Failure Time Models via Instrument Interactions

We study causal inference for time-to-event outcomes under right censoring in the presence of unmeasured confounding. Focusing on structural accelerated failure time models, we develop an identification and inference framework that exploits…

Methodology · Statistics 2026-05-29 Qiushi Bu , Wen Su , Xinyu Zhang , Xingqiu Zhao , Zhonghua Liu

Insurance Pricing Optimization via Off-Policy Evaluation

Traditional insurance pricing relies on risk-based principles that ensure actuarial fairness and solvency but do not explicitly account for policyholders' price sensitivity. We formulate insurance pricing as a decision-making problem and…

Machine Learning · Statistics 2026-05-29 Sascha Günther , Dimitri Semenovich , Mario V. Wüthrich

Triangular-Reference Schr\"odinger Bridges for Time Series Generation

We introduce Triangular-Reference Schr\"odinger Bridges for Time Series (TR-SBTS), a conservative extension of the SBTS framework in which the Brownian reference is replaced by an intervalwise frozen, possibly degenerate diffusion…

Machine Learning · Statistics 2026-05-29 Gabriele Bocchi

Stop Suppressing the Tail: Causal Inference for Extreme Events

Estimating how an outcome responds to a continuous treatment (the Average Dose-Response Function, or ADRF) is a core causal-inference primitive. However, when outcomes possess heavy tails, standard robust double machine learning (DML)…

Machine Learning · Statistics 2026-05-29 Eichi Uehara

Semiparametric Inference for Causal Effects on Functional Outcomes

Difference-in-differences (DiD) is a cornerstone of causal inference, yet extending it to functional outcomes is not a routine scalar generalization; rather, it entails three fundamental challenges in identification, inference, and…

Methodology · Statistics 2026-05-29 Junzhu Nie , Chengxiu Ling , Mengfei Ran

Nonparametric Regression via Tree-Guided Feature Aggregation

In regression problems where covariates are naturally organized in a hierarchical tree structure, a central challenge is to select the resolution at which covariates enter the model. Determining this level of feature aggregation is of…

Methodology · Statistics 2026-05-29 Sithija Manage , Y. Samuel Wang , Martin T. Wells

MEDAL: Manifold Embedding Distillation via Autoencoder Learning

Low-dimensional embeddings are widely used as visual summaries of high-dimensional data and to enable downstream scientific discoveries. Yet, popular nonlinear dimension reduction methods, such as t-SNE and UMAP, are often selected based on…

Machine Learning · Statistics 2026-05-29 Irene Chang , Tarek M. Zikry , Genevera I. Allen

Variance-Aware Estimation and Inference for Michaelis--Menten Models with Heteroscedastic Errors and Clustered Measurements

Michaelis--Menten analysis is often conducted by nonlinear least squares under a constant-variance assumption, even though enzyme-kinetic data frequently display concentration-dependent heteroscedasticity and often include repeated or…

Methodology · Statistics 2026-05-29 Mijeong Kim , Minkyoung Cha , Ah Young Jeong

Online Learning-to-Defer with Varying Experts

Learning-to-Defer (L2D) methods route each query either to a predictive model or to external experts. While existing work studies this problem in batch settings, real-world deployments require handling streaming data, changing expert…

Machine Learning · Statistics 2026-05-29 Dang Hoang Duy , Yannis Montreuil , Maxime Meyer , Axel Carlier , Lai Xing Ng , Wei Tsang Ooi

Self-Supervised Laplace Approximation for Bayesian Uncertainty Quantification

Approximate Bayesian inference typically revolves around computing the posterior parameter distribution. In practice, however, the main object of interest is often a model's predictions rather than its parameters. In this work, we propose…

Machine Learning · Statistics 2026-05-29 Julian Rodemann , Alexander Marquard , Thomas Augustin , Michele Caprio