Statistics — Scifaro

A Jensen-Shannon divergence based $k$--$NN$ algorithm for missing value imputation in compositional data

A novel nonparametric method to impute missing values in compositional data is developed. The method is based on the $k$--$NN$ algorithm, utilizes the Jensen-Shannon divergence and employs the Fr{\'e}chet mean to allow for more flexibility…

Methodology · Statistics 2026-05-29 Michail Tsagris , Connie Stewart , Abdulaziz Alenazi

Eigen-Spike Emergence and Quadratic Equivalents for Conjugate Kernels on Nonlinearly Separable Data

Recent work in random matrix theory (RMT) has developed the notion of deterministic equivalents: typically linear surrogate models that approximate the spectral behavior of large nonlinear random matrices, such as nonlinear feature maps in…

Machine Learning · Statistics 2026-05-29 Collin Cranston , Zhichao Wang , Todd Kemp , Michael W. Mahoney

Matching Rates and Optimal Allocation for Federated Probe-Logit Distillation under Heterogeneous Bandwidth Budgets

In federated language modeling, $K$ nodes each hold $n$ samples but cannot pool data or exchange full-precision gradients or weights. We study the minimax rate at which a conditional distribution over $V$ tokens can be estimated when each…

Machine Learning · Statistics 2026-05-29 Prasanjit Dubey , Xiaoming Huo

Experimentation for Different Scheduling Policies on Queues: Mixed Differences-in-Q Estimators Based on Little's Law

In data centers, tasks are dispatched to various servers to evenly distribute the workload. When a data center considers implementing a new scheduling algorithm, it typically conducts an A/B test prior to deployment to assess the real-world…

Methodology · Statistics 2026-05-29 Nanshan Jia , Ramesh Johari , Nian Si , Zeyu Zheng

Hierarchical forecasting: The role of information

In hierarchical forecasting, the process of forecast reconciliation transforms a set of "base" or "raw" forecasts, which do not satisfy the hierarchical aggregation constraints in the real data, into a set of "coherent" forecasts, which do…

Methodology · Statistics 2026-05-29 Minh Nguyen , Farshid Vahid , Shanika L Wickramasuriya

Learning study similarity to investigate heterogeneity in meta-analysis using LLMs and triplet loss

Meta-analyses of observational studies often show substantial between-study heterogeneity, limiting the interpretability of pooled estimates. Meta-regression can be used to explore heterogeneity, but it is often underpowered to handle…

Methodology · Statistics 2026-05-29 Kanella Panagiotopoulou , Theodoros Evrenoglou

Change-point estimation for Weibull time series with copula-based Markov models

We study offline change-point estimation for time series data exhibiting nonlinear serial dependence. To address this problem, we propose a copula-based Markov chain model with Weibull marginal distributions, which is suitable for modeling…

Methodology · Statistics 2026-05-29 Li-Hsien Sun , Zong-Yuan Huang , Yi-Ling Huang , Chi-Yang Chiu , Ning Ning

Active learning strategy for excursion-set confidence regions of functional simulator outputs

Estimating excursion set confidence regions seeks to identify regions where a function may exceed some threshold with a given confidence level. This paper focuses on estimating such confidence regions in cases where the function has random…

Methodology · Statistics 2026-05-29 Lucas Brunel , Mathieu Balesdent , Loïc Brevault , Rodolphe Le Riche , Bruno Sudret

`pandemonium`: High Dimensional Analysis in Linked Spaces

A common challenge in data analysis is uncovering relationships between predictors and responses in problems involving large numbers of both. When the number of predictors and responses is limited, visual approaches are particularly…

Computation · Statistics 2026-05-29 Gabriel McCoy , German Valencia , Ursula Laa

Deep Optimal Individualized Treatment Rules for Bivariate Survival Outcomes via Adaptive Prediction-Powered Learning

In randomized trials involving multiple treatments, bivariate survival outcomes present significant analytical challenges for making decisions. This paper addresses the problem of deriving optimal individualized treatment rules to maximize…

Machine Learning · Statistics 2026-05-29 Kun Ren , Yifan Cui , Wen Su

Model-free estimation in scattering analysis of microscopy

The mean squared displacement (MSD) of particles or probes is commonly estimated from microscopy videos using particle tracking approaches, which rely on tuning parameters manually, and are often unstable over the entire lag time range,…

Applications · Statistics 2026-05-29 Tong Lin , Jinseok Lee , Matt Helgeson , Megan T. Valentine , Yimin Luo , Mengyang Gu

Power Estimation for Longitudinal Studies with Time Dependent Covariates Using Generalized Method of Moments

Longitudinal studies frequently incorporate covariates that evolve over time, creating complex dependence structures between outcomes and predictors. When covariates are time dependent, standard power analysis tools--largely developed for…

Methodology · Statistics 2026-05-29 Niloofar Ramezani , Oliver Hurst

Low Rank for Rank: Uncertainty-Aware Task-Specific LLM Ranking under Sparse Pairwise Comparisons

Pairwise human-preference platforms such as Chatbot Arena have become central to large language model (LLM) evaluation, yet reliable task-specific ranking remains challenging. Global leaderboards mask task heterogeneity, while ranking each…

Methodology · Statistics 2026-05-29 Jiachun Li , David Simchi-Levi , Will Wei Sun

Gaussian Differentially Private $e$-values: Construction, Threshold Calibration, and Multiple Testing

This paper develops a framework for differentially private $e$-values under Gaussian differential privacy ($\mu$-GDP). We characterize the canonical noise mechanism, establishing that optimal multiplicative perturbation follows a Gaussian…

Methodology · Statistics 2026-05-29 Qi Kuang , Bowen Gang , Yin Xia

Efficient Inference for Incremental Causal Effects of Time to Treatment

We consider time to treatment initiation. This can commonly occur in preventive medicine, such as disease screening and vaccination; it can also occur with non-fatal health conditions such as HIV infection without the onset of AIDS. While…

Methodology · Statistics 2026-05-29 Zhichen Zhao , Andrew Ying , Ronghui Xu

Conformal prediction for functional time series: Application to age-specific mortality rates

In demographic literature, forecast uncertainty is often quantified with a statistical model. This model-based approach may potentially suffer from drawbacks, namely model misspecification, selection effect, and lack of finite-sample…

Applications · Statistics 2026-05-29 Han Lin Shang

Rapid Approximation Prediction for Kriging

Exact Kriging and conditional simulation (CS) for uncertainty quantification are computationally infeasible for modern spatial analyses with large numbers of observations and dense prediction grids. We present a rapid approximation to the…

Methodology · Statistics 2026-05-29 Ziyu Li , Gregory Fasshauer , Douglas Nychka

Outcome-Calibrated Regression and Predicted Outcome-Based Inference

Regression is a fundamental tool in scientific research. Ordinary least squares (OLS), one of the most widely used regression methods, enjoys several desirable properties, including the best linear unbiased estimator (BLUE) property. It is…

Methodology · Statistics 2026-05-29 Hwiyoung Lee , Shuo Chen

Prediction-Powered Inference Across Many Tasks for AI Evaluation & Social Science Research

Many applications require statistically valid inference across many related tasks, while using only a handful of high-quality labels per hypothesis. In AI evaluation, these tasks may correspond to model behaviors across prompts, subgroups,…

Machine Learning · Statistics 2026-05-29 Nicolas Emmenegger , Ellery Stahler , Chara Podimata

Valid and efficient possibilistic fusion

Besides the classical motivation of fusing evidence from multiple sources, modern inferential procedures based on randomization, resampling, and data splitting often introduce analyst-generated multiplicity, where aggregating outputs across…

Methodology · Statistics 2026-05-29 Leonardo Cella