Statistics — Scifaro

Global Sensitivity Analysis: a novel generation of mighty estimators based on rank statistics

We propose a new statistical estimation framework for a large family of global sensitivity analysis indices. Our approach is based on rank statistics and uses an empirical correlation coefficient recently introduced by Chatterjee [9]. We…

Methodology · Statistics 2026-05-25 Fabrice Gamboa , Pierre Gremaud , Thierry Klein , Agnès Lagnoux

Trajectory-Oriented Optimization Via Adaptive Thompson Sampling And Grid Refinement: A Tutorial With The ADAPTIVE\_TS Package

Stochastic simulators are increasingly used to expand the frontier of scientific knowledge and inform decision-making across real-world contexts. Simulator calibration, a process by which internal model inputs are tuned to match some…

Computation · Statistics 2026-05-25 David O'Gara , Arindam Fadikar , Mickaël Binois , Nicholson Collier , Jonathan Ozik

Joint Estimation of Marginal and Heterogeneous Treatment Effects

Randomized clinical trials typically aim to estimate a marginal treatment effect. While covariate adjustment can improve precision, it may change the estimand in nonlinear models due to noncollapsibility, leading to conditional rather than…

Methodology · Statistics 2026-05-25 Leticia Wuethrich , Torsten Hothorn

A note on closed-form solutions for estimating sample size when externally validating a binary prediction model based on $C$-statistic precision

External validation of clinical prediction models is crucial for assessing whether they are fit for use. The $C$-statistic is a widely used measure of discriminative performance of such models predicting a binary outcome. A method for…

Methodology · Statistics 2026-05-25 Denis A. Shah , Erick D. De Wolf , Pierce A. Paul , Laurence V. Madden

Dirichlet-Based Monte Carlo Dropout for Uncertainty Estimation in Neural Networks

Traditional neural networks provide deterministic predictions without inherent uncertainty estimates. While Bayesian Neural Networks (BNNs) offer a principled approach to uncertainty quantification, their computational complexity limits…

Machine Learning · Statistics 2026-05-25 Rouaa Hoblos , Noura Dridi , Noureddine Zerhouni , Zeina Al Masry

Directional subset simulation method for reliability analysis

Estimating the probabilities of rare failure events is a key challenge in the reliability analysis of physical systems. Subset simulation (SS) is a very popular adaptive Monte Carlo method for this problem. In SS, the small failure…

Computation · Statistics 2026-05-25 Oindrila Kanjilal , Julien Bect

The frame problem in quantitative practice: ontological uncertainty and epistemic humility in an age of automated inference

Quantitative practice across statistics, engineering, and machine learning has been transformed by the automation of inference. Predictions are produced, validated, and deployed at scale and speed that human-mediated reasoning could not…

Methodology · Statistics 2026-05-25 William Fauriat

Asymmetric Scaling Laws from Sparse Features

We introduce a model for neural scaling laws under sparse activations. In the model, test loss is often dominated by rare coordinates that are never observed in the training input. This mechanism induces a novel bottleneck absent from dense…

Machine Learning · Statistics 2026-05-25 John Sous , Michael Winer

Concomitant DAG Learning: On the Roles of Noise Adaptivity, Sparsity, and Non-negativity

Directed acyclic graphs (DAGs) constitute a central modeling tool to enable principled reasoning about cause-effect interactions in complex systems. However, since the causal structure underlying a group of variables is often unknown and…

Machine Learning · Statistics 2026-05-25 Gonzalo Mateos , Samuel Rey , Hamed Ajorlou , Mariano Tepper

Generalized Rank Regression

Rank regression offers robustness to outliers and heavy-tailed response distributions, invariance to monotonic transformations, and improved efficiency under non-Gaussian errors, making it a versatile tool for analyzing complex data. This…

Methodology · Statistics 2026-05-25 Jiyuan Tu , Suqi Wu , Yichen Zhang , Wen-Xin Zhou

Coupled Training with Privileged Information and Unlabeled Data

In many prediction problems, we have extra information during training (for example, measurements that are expensive or slow to collect) that will not be available when the model is deployed. A common strategy is to first train a model that…

Machine Learning · Statistics 2026-05-25 Jiahao Shi , Omar Hagrass , Jason M. Klusowski

Regulatory Considerations for Using Artificial Intelligence Models to Reduce Sample Sizes in Registrational Studies

Applications of artificial intelligence (AI) in drug development continue to increase at a rapid pace. Regulatory authorities have provided increasingly clear perspectives on the use of AI in regulated applications, including recent draft…

Applications · Statistics 2026-05-25 Aaron M. Smith , Tala Fakhouri , Run Zhuang , Jonathan R. Walsh

A Direct Variance Estimation (DiVE) for Meta-Analysis of Median Differences

Meta-analyses of two-group studies that report median differences typically rely on methods that require, in addition to the median difference and sample size, summary measures of dispersion such as quartiles or ranges. Studies that do not…

Methodology · Statistics 2026-05-25 Tadahisa Okuda , Masataka Taguri , Kenichi Hayashi

Mixture-of-Finite-Mixtures Wishart Model for Clustering Covariance Matrices with an Application to Brain Functional Connectivity

Data represented as covariance-type matrices arise in many fields, including brain functional connectivity and diffusion tensor imaging. We develop the MFM-Wishart, a Bayesian model-based clustering approach for such data that combines…

Methodology · Statistics 2026-05-25 Zongyu Li , Stefano Castruccio , Zhiyong Zhang

Operationalizing Individual Fairness via Gradient Descent and Bradley-Terry Models

Individual fairness, the notion that "similar individuals should be treated similarly," provides a strong and flexible fairness guarantee for algorithmic decision makers. However, a barrier to implementing individual fairness in practice is…

Machine Learning · Statistics 2026-05-25 Conlan Olson , Linjun Zhang , Zhun Deng , Pragya Sur

LLM Sparsity Prior for Robust Feature Selection

Large language models (LLMs) offer a scalable mechanism to elicit domain-informed prior information for high-dimensional variable selection. However, existing methods such as LLM-Lasso are sensitive to weight quality, with performance…

Machine Learning · Statistics 2026-05-25 Caleb Skinner , Yihan Guo , Meng Li

Sample correlation adjustments for robust Multi-fidelity Monte Carlo under limited pilot sampling

Multi-fidelity Monte Carlo (MFMC) is a variance reduction method that leverages a multi-fidelity ensemble of models of varying cost and accuracy levels. Constructing an MFMC estimator with optimal variance requires knowledge of the…

Methodology · Statistics 2026-05-25 Michael Stanley , Thomas Coons , Geoffrey Bomarito , Patrick Leser , Joshua Pribe , James Warner

Diffusion-based Denoising Beats Vanilla Score Matching in Parameter Estimation: A Theoretical Explanation

Score matching is an alternative to maximum likelihood estimation when the normalizing constant is unknown or too costly to evaluate. However, vanilla score matching has shown to be inefficient relative to maximum likelihood estimation for…

Machine Learning · Statistics 2026-05-25 Benedikt Lütke Schwienhorst , Nadja Klein , Johannes Lederer

Sequential Sensitivity Analysis for Multiple Assumptions: A Framework for Understanding Racial Disparity in Police Use of Force

Inferring racial discrimination in police use of force -- the average causal effect of civilian race on use of force -- requires two assumptions about policing prior to potential use of force: that officers do not discriminate in whom they…

Methodology · Statistics 2026-05-25 Thomas Leavitt , Jake Bowers , Luke Miratrix

Semi-Parametric Bayesian Additive Regression Trees for Risk Prediction with High-Dimensional Epigenetic Signatures and Low-Dimensional Covariates

In the era of precision medicine, genome-wide epigenetic modifications offer rich data that could inform risk prediction. However, these data are high-dimensional and exhibit complex dependence structures, which makes it difficult to…

Applications · Statistics 2026-05-25 Saurabh Bhandari , Parveen Bhatti , Brian C. -H. Chiu , Yuan Ji