统计学 — Scifaro

Hierarchical Clustering As a Novel Solution to the Notorious Multicollinearity Problem in Observational Causal Inference

Multicollinearity is a long lasting challenge in observational causal inference, especially in regressions -- highly correlated independent variables make it hard to isolate their individual impacts on outcomes of interest. While common…

统计方法学 · 统计学 2026-06-30 Yufei Wu , Zhiying Gu , Alex Deng , Jacob Zhu , Linsha Chen

Simultaneous confidence bands for cumulative hazard via exchangeable bootstrap and box calibration

Resampling-based simultaneous confidence bands for cumulative hazard functions often undercover in finite samples with right censoring. We study two aspects of the construction that can contribute to this gap, the resampling scheme and the…

统计方法学 · 统计学 2026-06-29 Min Lin , Grzegorz Rempala , Eben Kenah , Qianying Lin

Universal Inference for model selection on networks

Model selection and hypothesis testing are important tasks on networks. A key challenge lies in the inherent dependence in network data, as well as the fact that typically only a single realization is observed. As a result, many existing…

统计方法学 · 统计学 2026-06-29 Eric Yanchenko , Jonathan P. Williams , Ryan Martin

Residual-on-Residual Regression as a Tool for Effect Estimation in Observational Data

Epidemiologists increasingly use machine learning to adjust for high-dimensional confounding. Augmented inverse probability weighting (AIPW) and targeted maximum likelihood estimation (TMLE) are most widely used but may yield different…

统计方法学 · 统计学 2026-06-29 Ashley I. Naimi , Qianhui Jin , Ya-Hui Yu , Sara M. Parisi , Lisa M. Bodnar

Exponential-Family Tensor Completion via Nonconvex Dual Total-Variation Regularization

With the emergence of various tensor data, tensor completion from partial measurements has attracted widespread attention in data science and signal processing. Total Variation (TV) has been widely used as an effective regularization…

统计方法学 · 统计学 2026-06-29 Wenfei Cao , Yang Chen , Qibin Zhao , Jinglai Li , Andrzej Cichocki

SGD at the Edge of Stability: Stochastic Stabilization with Large Learning Rates

Modern deep learning has been shown to operate at the edge of stability, routinely using learning rates far larger than those justified by classical optimization theory. Most prior analyses of the edge of stability phenomenon focus on…

机器学习 · 统计学 2026-06-29 Konstantinos Emmanouilidis , Lachlan MacDonald , Salma Tarmoun , Rene Vidal

Cross-Fitted Survey-Weighted TMLE with Design-Based Variance for Causal Machine Learning

Cross-fitting is not a refinement of survey-weighted causal machine learning but, once the nuisances are flexible, what restores valid inference. We study the population average treatment effect under a stratified multistage design,…

统计方法学 · 统计学 2026-06-29 M. Ehsan Karim

Dynamic Prediction of Alternating Recurrent Events via Neural Network

Alternating recurrent events -- event-times of a specific nature that trigger a secondary refractory period -- occur in a wide-range of fields, including behavioral science, criminal justice, and biostatistics. Analysis of these events…

机器学习 · 统计学 2026-06-29 Abigail Loe , Susan Murry , Zhenke Wu

Separation Capacity of Scattering Networks

In this paper, we attempt to enhance the theoretical understanding of convolutional neural networks (CNNs) as feature extractors in classification tasks by analyzing them through the lens of Cover's function-counting theory. Specifically,…

机器学习 · 统计学 2026-06-29 Konstantin Häberle , Helmut Bölcskei

Spatial Dependence in the Self-Response: Spatial Dependence, Modeling, and Operational Consequences

The U.S.\ Census Bureau's Low Response Score (LRS) is a central planning instrument for identifying places likely to require additional self-response outreach and nonresponse follow-up. The published LRS is intentionally interpretable: it…

应用统计 · 统计学 2026-06-29 Emanuel Ben-David

Optimization Dynamics Imprint Semantic Specificity in Contrastive Embedding Norms

Contrastive embedding models trained with scale-invariant losses are typically paired with distance metrics like cosine similarity, effectively ignoring embedding magnitudes. However, surprisingly, empirical studies reveal that despite…

机器学习 · 统计学 2026-06-29 Ziwei Su , Junyu Ren , Victor Veitch

Tuning-Free Efficient Estimation for Multi-Source Data via Covariance-Aware Shrinkage

Modern statistical learning problems often involve multiple related data sets, where learning efficiency on a target set can be improved by utilizing related source sets, while heterogeneity among the source sets may introduce bias.…

统计方法学 · 统计学 2026-06-29 Wenbo Jing , Xi Chen , Yaqi Duan , Kaizheng Wang , Yichen Zhang

Doubly Robust Adaptive Conformal Inference for Causal Effects Under Temporal Dependence

We propose doubly robust adaptive conformal inference (DR-ACI), which constructs prediction intervals for doubly robust pseudo-outcomes under temporal dependence.

机器学习 · 统计学 2026-06-29 Andreas Koukorinis , Ricardo Silva

Factorizable Normalizing Flows for parameter-dependent density morphing

Normalizing Flows excel at modeling a single fixed density, yet many problems across the sciences, such as high energy physics, instead require modeling how that density deforms as a function of continuous parameters: the strength of a…

机器学习 · 统计学 2026-06-29 Davide Valsecchi , Mauro Donegà , Rainer Wallny

Non-parametric recovery of causal diffusion mechanisms from steady-state observations

We consider sparse multivariate stochastic systems that evolve in continuous time according to a causal mechanism and present methodology to recover the system's time-infinitesimal transition mechanism from mere cross-sectional data. This…

机器学习 · 统计学 2026-06-29 Richard Schwank , Mathias Drton

SGD Provably Prioritizes a Shortcut Spurious Feature in the XOR Model

Neural networks are known to be susceptible to over-reliance on spurious correlations. However, the precise mechanism by which models exploit shortcut features is not fully understood, and algorithms to mitigate this behavior rely on as yet…

机器学习 · 统计学 2026-06-29 Tyler LaBonte , Vidya Muthukumar

Multiscale Dynamic Dependence Estimation over Networks

In numerous scientific and industrial settings, observed multivariate time series are often nonstationary in nature, i.e., comprise data whose second order properties vary over time. An additional feature of many modern datasets is that the…

统计方法学 · 统计学 2026-06-29 Cristian F. Jiménez-Varón , Marina I. Knight , Matthew A. Nunes

A Stochastic--Geometric Theory of Scaling Laws in Grokking

Delayed generalization (\ie~grokking) refers to the phenomenon in which a neural network fits its training data early in training but only begins to generalize after a prolonged delay, often through an abrupt transition. Despite extensive…

机器学习 · 统计学 2026-06-29 Róisín Luo , Christian Gagné , Jonas Ngnawé , Ihsan Ullah , Karyn Morrissey

Extrapolating from Regularised Solutions for Solving Ill-Conditioned Linear Systems in Machine Learning

Rapid prototyping of algorithms is a critical step in modern machine learning. Most algorithms exploit linear algebra, creating a need for lightweight numerical routines which -- while potentially sub-optimal for the task at hand -- can be…

机器学习 · 统计学 2026-06-29 Disha Hegde , Jon Cockayne , Chris. J. Oates

Evaluating HWE and Association in Genome Wide Association Studies: A Unified Procedure

In genome wide association studies (GWASs) based on a case-control design, single nucleotide polymorphisms (SNPs) are typically evaluated for an association test and a Hardy-Weinberg equilibrium (HWE) goodness-of-fit test. SNPs are then…

统计方法学 · 统计学 2026-06-29 Stefan Böhringer , Hajo Holzmann