Statistics — Scifaro

CART Random Forests as Sequential Allocation over Random Opportunity Sets: A Stochastic-Control Theory of Ensemble Risk

CART random forests are among the most widely used modern predictive methods, with well-documented empirical success. Yet, at the mechanistic level, the algorithm is often treated as a black box because of its complexity. In this paper, we…

Machine Learning · Statistics 2026-05-27 Tianxing Mei , Yingying Fan , Mingming Leng , Jinchi Lv

Data-driven sparse identification of governing PDEs via knockoff filters and multi-criteria trade-offs

We propose KO-PDE-IDENT, a data-driven framework for identifying parsimonious partial differential equations (PDEs) with false discovery rate (FDR) control. PDE discovery from noisy observations is often hindered by extreme…

Applications · Statistics 2026-05-27 Pongpisit Thanasutives , Naichang Ke , Yoshinobu Kawahara

Statistical Inference and Stability Boundaries of Multi-cellular Interaction Hypergraphs from Asynchronous Event Streams

We introduce the Hyperedge-triggered Hawkes (HTH) process for inferring higher-order interaction structure in multi-cellular systems from asynchronous event-time data. Beyond standard pairwise excitation, the HTH intensity includes a term…

Methodology · Statistics 2026-05-27 Zihan Xu

Log-linear Model for Dual System Estimation and Computational Considerations

The use of dual system estimation (DSE) is heavily used in Census Bureau operations. With DSE methods, it is important to implement methods to infer the population size among those with missing data from one or both data sources. The use of…

Computation · Statistics 2026-05-27 Zhiyuan Lu

Using Transcripts for Nonparametric Monitoring of Serial Dependence

Control charts for process monitoring are widely used in practice. Most control charts require the monitored (residuals) process to be serially independent (and to satisfy specified distributional assumptions), whereas undetected dependence…

Methodology · Statistics 2026-05-27 Christian H. Weiß , José M. Amigó

Target-Oriented Statistical Compression: Sufficiency, Reverse Martingales, and Sequential Monitoring

Statistical procedures rarely retain all features of the observed data. A sufficient statistic removes information irrelevant to a parameter; a maximum likelihood estimate compresses an empirical objective into an optimizing point; and a…

Methodology · Statistics 2026-05-27 Yuan-chin Ivan Chang

Global Average Treatment Effects for Individualized Randomization Experiments with Aggregate Data

Individualized randomized experiments are central to online platforms for optimizing personalized decisions in complex environments. In two-sided markets, however, standard treatment effect estimation is often invalid due to strong temporal…

Methodology · Statistics 2026-05-27 Shuguang Yu , Ting Li , Yuchen Lu , Chengchun Shi , Fan Zhou , Zhichao Zou , Peng Zhen , Hongtu Zhu

Learning a directed acyclic graph with additive heteroscedastic errors

This paper studies causal discovery for a directed acyclic graph under a structural equation model with additive heteroscedastic errors. We first establish new identifiability results for location-scale noise models, showing that…

Methodology · Statistics 2026-05-27 Xintao Xia , Li Chen , Yue Hu , Chunlin Li

Improving inverse probability of censoring weighting for win statistics with composite survival outcomes

Win statistics, including the win ratio, net benefit, and win odds, summarize treatment effects on hierarchical composite endpoints by sequentially comparing patient pairs on component outcomes ordered by clinical importance, proceeding to…

Methodology · Statistics 2026-05-27 Xi Fang , Fan Li

Structure-Adaptive Conformal Inference for Large-Scale Out-of-Distribution Testing

This paper addresses structured out-of-distribution (OOD) testing in high-stakes machine learning applications. Traditional conformal methods rely on joint exchangeability, making it difficult to incorporate auxiliary information such as…

Methodology · Statistics 2026-05-27 Rongyi Sun , Wenguang Sun , Zinan Zhao

Fast Computational Methods for Regularized Estimating Equations

Estimating equations arise in a wide range of statistical applications, including longitudinal and clustered data analysis, survival analysis, econometrics, and semiparametric inference. In high-dimensional settings, adding…

Computation · Statistics 2026-05-27 Weihua Shi , Yixuan Li , Yi Lian , Archer Y. Yang , Yue Zhao

Confounder Detection via Treatment Intent: A New Observational Study Design

Understanding the effects of interventions is central to scientific progress, with randomized controlled trials (RCTs) regarded as the gold standard for causal inference in many applied fields. However, RCTs are costly, time-consuming, and…

Methodology · Statistics 2026-05-27 Drago Plecko , Patrik Okanovic , Torsten Hoefler , Elias Bareinboim

Small-Area Precipitation Forecasting and Drought--Flood Early Warning with Reverse-Martingale Regularized Recurrent Networks

Small-area precipitation forecasts support real-time decisions for reservoir operation, irrigation planning, drought monitoring, and flash-flood response. Operational value depends not only on point accuracy, but also on calibrated…

Applications · Statistics 2026-05-27 Foo Hui-Mean , Yuan-chin Ivan Chang

When Does LeJEPA Learn a World Model?

A representation that scrambles the true degrees of freedom of the world cannot support reliable planning or compositional generalization. We prove that LeJEPA (alignment plus Gaussian regularization) linearly recovers the world's latent…

Machine Learning · Statistics 2026-05-27 David Klindt , Yann LeCun , Randall Balestriero

Unobserved Heterogeneity in Threshold Regression Based on the Hitting Times of a Reflected Brownian Motion for Recurrent Hypoglycemia

Analyses of recurrent hypoglycemia are critical for effective treatment management in diabetic patients. Typically, within-subject dependency in such analyses is captured through subject-level frailty. Recent research has modeled recurrent…

Methodology · Statistics 2026-05-27 Yingfa Xie , Haoda Fu , Yuan Huang , Jun Yan

Cross-modal dependence analysis with asynchronous longitudinal multimodal data

We propose a Bayesian latent variable model to estimate covariate-assisted dependence structures across multiple modalities of multivariate data that may be observed asynchronously. This setting commonly arises in longitudinal biomedical…

Methodology · Statistics 2026-05-27 Kun Qian , Hyung G. Park

Beyond Differences: Doubly Robust Meta-Learners for Ratio-Based Treatment Effects

When treatment effects are naturally expressed as ratios -- as in medicine, pricing, and marketing -- the ratio-based CATE $\tau(x) = E[Y|W=1,X=x] / E[Y|W=0,X=x]$ is the appropriate estimand. Yet existing estimators either impose a…

Machine Learning · Statistics 2026-05-27 Michael Fuchs , Dominik Kreiss

Learning Nonlinear Factor Models with Unknown Monotone Links from Incomplete and Noisy Data

We study a nonlinear factor model in which observed responses depend on low-rank latent factors through an unknown monotone link function. This setting is challenging and largely underexplored due to severe nonconvexity and identifiability…

Machine Learning · Statistics 2026-05-27 Yutong Chao , Resat Gökhan , Jalal Etesami , Ali Habibnia

Length-biased Birnbaum-Saunders quantile regression with application to water evaporation

Length-biased distributions arise naturally in environmental, reliability, and economic studies where the sampling mechanism favors larger observational units. In this paper, we propose a quantile regression model based on the length-biased…

Methodology · Statistics 2026-05-27 Helton Saulo , Tailine Nonato , Roberto Vila

A Post-Processing Conformal Prediction Approach for Conditional Coverage via Pivotal Scores

While Conformal Prediction (CP) has proven to be a powerful framework for uncertainty quantification, guaranteeing conditional coverage remains a central challenge. Although finite-sample, distribution-free conditional validity is known to…

Methodology · Statistics 2026-05-27 Félix Laplante