统计学 — Scifaro

Topological reconstruction of Rubin multiple imputation via coarse proximity, Seifert van Kampen gluing and Hurewicz invariants

Rubin multiple imputation (MI) generates plausible data completions to account for uncertainty and statistical variability but provides little insight into their global organization. We introduce a topological reconstruction approach that…

应用统计 · 统计学 2026-06-27 Arturo Tozzi

Multivariate Varying-Coefficient BART with Graphical Horseshoe Priors

Modern multivariate regression problems involve several related outcomes whose regression effects are not only nonlinear, heterogeneous, and outcome-specific, but also where the residual dependence among outcomes is scientifically…

统计方法学 · 统计学 2026-06-27 Soham Ghosh , Sameer K. Deshpande

Panel Flow Matching: A Generative Approach to Learning Distributions of Longitudinal Data

Learning distributions of longitudinal data is central to tasks such as visualization, completion, classification, and synthetic data generation, but it remains statistically challenging because longitudinal observations are often…

统计方法学 · 统计学 2026-06-27 Jianbin Tan , Pixu Shi , Anru R. Zhang

spca: An R package to Compute Least Squares Sparse Principal Components

This paper introduces the R package spca, which provides a computational framework for least squares sparse principal component analysis (LS-SPCA). Unlike other SPCA methods, LS-SPCA generates uncorrelated sparse principal components (sPCs)…

统计计算 · 统计学 2026-06-27 Giovanni Maria Merola

Connectivity Estimation using Stochastic Graph Heat Modelling

A growing number of techniques leverage the spatial structures that underlie many real-world datasets. Despite these advances, the complementary task of estimating spatial structures and understanding their role within these techniques has…

机器学习 · 统计学 2026-06-27 Stephan Goerttler , Min Wu , Fei He

Learning heterogeneous treatment effects under principal stratification

Principal stratification provides a foundational framework for causal inference with intermediate outcomes by defining causal effects within subpopulations, yet existing work has largely focused on average effects across strata rather than…

统计方法学 · 统计学 2026-06-27 Jiaqi Tong , Fan Li

On Modeling Cylindrical Data with a Discrete Circular Component and Its Environmental Applications

Standard statistical methods are often inadequate for modeling the joint dependence between linear and circular variables, and existing methods for modeling this dependence are designed only for continuous variables. However, circular data…

统计方法学 · 统计学 2026-06-27 Brajesh Kumar Dhakad , Jayant Jha

Beta-trees for testing multivariate goodness-of-fit and localizing deviations from a model

We introduce a novel goodness-of-fit (GOF) procedure based on Beta-tree partitions. A Beta-tree produces a data-adaptive partition of the sample space into regions and provides guaranteed finite sample confidence intervals for the…

统计方法学 · 统计学 2026-06-27 Valerie N. P. Ho , Guenther Walther

Generated outcomes as generated regressors: Equivalences in recursive causal estimation

Time-varying treatment effects, surrogate-identified treatment effects, and mediation effects can all be written as recursive regressions, in which each regression's predicted values become generated outcomes for the next regression. We…

统计方法学 · 统计学 2026-06-27 Wisse Rutgers , Rahul Singh

A Bayesian latent Gaussian process framework for aerodynamic uncertainty quantification

Predicting the aerodynamic performance (e.g. lift, drag, and moment coefficients) of an aircraft is challenging -- computational models are biased and direct simulations are prohibitive. A pragmatic way to overcome this limitation is by…

机器学习 · 统计学 2026-06-27 Geoffrey Davis , Ashwin Renganathan

Perspectives on Latent Factor Indeterminacy and its Implications for Data Representation

The common factor analytic model is related to Helmholtz and Boltzmann machines, can be conceived as a linear autoencoder, or can be thought of as a single-hidden-layer generative neural network. We thus consider it a basal generative…

机器学习 · 统计学 2026-06-27 Carel F. W. Peeters

Methods to address measurement error in both Outcome and Covariates

Biomedical research is increasingly relying on readily available routine data, such as electronic health records. Routinely collected data, as well as datasets from large cohorts, are often prone to measurement error which, if not addressed…

应用统计 · 统计学 2026-06-27 Pamela A. Shaw , Bryan E. Shepherd

Variance Reduction for Stochastic Gradient Generalized Non-reversible Langevin Monte Carlo Algorithms

We study the leading-order fluctuation of stochastic gradient Euler-Maruyama estimators for generalized non-reversible Langevin dynamics. Under structural assumptions tailored to the small-stepsize central limit theorem and under an…

机器学习 · 统计学 2026-06-27 Bingye Ni , Xiaoyu Wang , Yingli Wang , Lingjiong Zhu

Measurement Induced Confounding

A critical assumption of observational studies is that all confounding variables must be known and sufficiently adjusted for to estimate causal effects. An implicit, and often overlooked, aspect of this assumption is that all confounding…

统计方法学 · 统计学 2026-06-27 George Perrett , Klint Kanopka

Inferring Comprehensive Cohort Causal Effects in the Presence of Unmeasured Confounding and Missing Outcomes

This paper presents a methodological framework for estimating the comprehensive cohort causal effect (CCCE) in mixed-design clinical studies that combine randomized controlled trials (RCTs) and parallel observational study (OBS). Our…

统计方法学 · 统计学 2026-06-27 Shiyao Xu , Razieh Nabi , Martin Underwood , Daniel Scharfstein

Composition as Direction: An Active-Set Ray-Based Model for Sparse High-Dimensional Compositional Data

[Working Draft] Compositional data are central to microbial, ecological, and environmental research, yet often have four features that are difficult to accommodate jointly: exact zeros, latent dependence among components,…

统计方法学 · 统计学 2026-06-27 Michael R Schwob , Jyotishka Datta

Inverse Probability Weighting in a Post-Bayesian World

We present a justification of the use of Inverse Probability Weighting (IPW) in a post-Bayesian framework, in which the bias-correction provided by IPW in a frequentist context is reframed as a reweighting of the Kullback-Leibler (KL)…

统计方法学 · 统计学 2026-06-27 Owen Thomas , William Denault , Valeria Vitelli

Adaptive Iterative Hard Thresholding for Online High-dimensional Quantile Regression

Online high-dimensional regression requires algorithms that can update sequentially while preserving structural sparsity. We propose \textit{Adaptive Iterative Hard Thresholding (AIHT)}, an online sparse-regression framework that alternates…

机器学习 · 统计学 2026-06-26 Zitian Zhou , Nan Lin

A bootstrap approach to prediction-powered inference

Prediction-powered inference (PPI) refers to a two-level situation where the statistician observes a set of $(x,y)$ pairs and another set of $x$s with the responses $y$ missing. Also available is some independent background data from which…

统计方法学 · 统计学 2026-06-26 Bradley Efron

Conformal Prediction with Macro-Coverage Guarantees

Prediction sets should have high coverage to be useful, but some coverage notions are more practically relevant than others. In the classification setting, class-conditional coverage requires that the prediction set (i.e., the set of…

统计方法学 · 统计学 2026-06-26 Aabesh Bhattacharyya , Tiffany Ding , Rina Foygel Barber