Statistics — Scifaro

On the inclusion of non-concurrent controls in platform trials with an interim analysis

The analysis of platform trials can be enhanced by utilizing non-concurrent controls. Since including this data might also introduce bias in the treatment effect estimators if time trends are present, methods for incorporating…

Methodology · Statistics 2026-05-21 Pavla Krotka , Martin Posch , Marta Bofill Roig

Matrix Factorization-Based Solar Spectral Irradiance Missing Data Imputation with Uncertainty Quantification

The solar spectral irradiance (SSI) depicts the spectral distribution of solar energy flux reaching the top of the Earth's atmosphere. Daily SSI measurements constitute a matrix with spectrally (rows) and temporally (columns) resolved solar…

Applications · Statistics 2026-05-21 Yuxuan Ke , Xianglei Huang , Odele Coddington , Yang Chen

A Generalized Tangent Approximation based Variational Inference Framework for Strongly Super-Gaussian Likelihoods

Variational inference, as an alternative to Markov chain Monte Carlo sampling, has played a transformative role in enabling scalable computation for complex Bayesian models. Nevertheless, existing approaches often depend on either rigid…

Methodology · Statistics 2026-05-21 Somjit Roy , Pritam Dey , Debdeep Pati , Bani K. Mallick

A Practical Guide to Estimating Conditional Marginal Effects: Modern Approaches

This Element offers a practical guide to estimating conditional marginal effects-how treatment effects vary with a moderating variable-using modern statistical methods. Commonly used approaches, such as linear interaction models, often…

Methodology · Statistics 2026-05-21 Jiehan Liu , Ziyi Liu , Yiqing Xu

Batched Single-Index Global Multi-Armed Bandits with Covariates

The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making, where a decision-maker selects an arm in each round with the goal of maximizing long-term rewards. In many practical applications, such as…

Machine Learning · Statistics 2026-05-21 Sakshi Arya , Hyebin Song

How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective

Large language models (LLMs) are increasingly used to simulate survey responses, but synthetic data can be misaligned with the human population, leading to unreliable inference. We develop a general framework that converts LLM-simulated…

Methodology · Statistics 2026-05-21 Chengpiao Huang , Yuhang Wu , Kaizheng Wang

Digital N-of-1 Trials and their Application in Experimental Physiology

Traditionally, studies in experimental physiology have been conducted in small groups of human participants, animal models or cell lines. Identifying optimal study designs that achieve sufficient power for drawing proper statistical…

Applications · Statistics 2026-05-21 Stefan Konigorski , Mathias Ried-Larsen , Christopher H Schmid

Improved convergence rate of kNN graph Laplacians: differentiable self-tuned affinity

In graph-based data analysis, $k$-nearest neighbor ($k$NN) graphs are widely used due to their adaptivity to local data densities. Allowing weighted edges in the graph, the kernelized graph affinity provides a more general type of $k$NN…

Machine Learning · Statistics 2026-05-21 Xiuyuan Cheng , Yixuan Tan , Nan Wu

Computational-Statistical Trade-off in Kernel Two-Sample Testing with Random Fourier Features

Recent years have seen a surge in methods for two-sample testing, among which the Maximum Mean Discrepancy (MMD) test has emerged as an effective tool for handling complex and high-dimensional data. Despite its success and widespread…

Machine Learning · Statistics 2026-05-21 Ikjun Choi , Ilmun Kim

On the error control of invariant causal prediction

Invariant causal prediction provides a useful framework for identifying causal predictors of a response using heterogeneous data from multiple environments. One valuable property of the original invariant causal prediction method is that it…

Methodology · Statistics 2026-05-21 Jinzhou Li , Jelle J Goeman

Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization

Bayesian optimization (BO) selects evaluation points for expensive black-box objectives using Gaussian process (GP) predictive distributions. Kernel choice and hyperparameter selection can lead to miscalibrated predictive distributions and…

Machine Learning · Statistics 2026-05-20 Aurélien Pion , Emmanuel Vazquez

Mining Financial Data using Mixtures of Mirrored Weibull Distributions

Risk management is an important part of financial practice, essential for protecting assets and investments in modern-day volatile markets. This paper proposes a mixture of mirrored Weibull (MMW) distribution for modelling stock returns and…

Applications · Statistics 2026-05-20 Zijun Jia , Sharon X. Lee

Quantile-Based Effectiveness Persistence Function: A Tail-Focused Metric with Theory, Estimation, and Application to Biosimilar Evaluation

In clinical studies, persistence, which measures the duration of time a patient continues to take a prescribed medication without discontinuation, is increasingly recognized as a critical indicator of adherence to medication. Adherence…

Methodology · Statistics 2026-05-20 Sankaran P. G. , Prasanth V. P. , Midhu N. N

Federated Learning with Incomplete Data: When to Use Complete Cases and When to Weight

Privacy constraints have driven the rise of federated learning (FL), which enables multi-site analyses without sharing individual participant data. We develop a framework for FL with missing data, identifying conditions under which the…

Methodology · Statistics 2026-05-20 Jesus E. Vazquez , Yicheng Shen , Jason Akulian , Chad Hochberg , Theodore J. Iwashyna , Elizabeth A. Stuart , Jiayi Tong

Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately,…

Machine Learning · Statistics 2026-05-20 Peter Matthew Jacobs , Jeff M. Phillips

Tail Annealing for Heavy-Tailed Flow Matching

Standard generative models struggle with heavy-tailed data: Lipschitz architectures cannot produce power-law tails from Gaussian noise, and interpolating between heavy-tailed data and Gaussians is ill-posed. We propose a simple fix: apply…

Machine Learning · Statistics 2026-05-20 Jean Pachebat

Identifying Interventional Joint Distributions via Extended Bridge Functions

Existing identification results in proximal causal inference often focus on marginal interventional distributions using standard outcome or treatment bridge functions. These methods do not generally identify joint interventional…

Methodology · Statistics 2026-05-20 Constantin Schott

Estimating treatment duration effects via clone-censor-weight: a breast cancer case study

In this work, we study the estimation of treatment duration effects in observational survival data, where treatment and covariate histories evolve over time and longer observed durations are only attainable among individuals who remain…

Methodology · Statistics 2026-05-20 Charlotte Voinot , Noémie Simon-Tillaux , Emma Torrini , Stefan Michiels , Bernard Sebastien , Clément Berenfeld , Julie Josse

Sample Size Determination Under Selection Bias: Robust Tolerance Limits for Prevalent Cohort Data

Tolerance limits have received considerable attention in the statistical literature, with applications reaching far beyond their initial role in quality control. The well-known formula of Scheff\'e and Tukey (1944) establishes a simple,…

Methodology · Statistics 2026-05-20 James H. McVittie , Martin Lysy , Masoud Asgharian

Stationary subspace analysis for spatial data

Stationary subspace analysis (SSA) is a blind source separation framework that decomposes linearly mixed multivariate data into stationary and nonstationary components. We extend SSA to spatially indexed data by introducing spatial…

Methodology · Statistics 2026-05-20 Perttu Saarela , Klaus Nordhausen , Jaakko Pere , Anne M. Ruiz