统计学 — Scifaro

Surprises in Proper Positive-Only Learning

Binary classification from positive-only samples is a variant of PAC learning in which the learner receives i.i.d. samples from the positive region of an unknown target concept, but is evaluated under the original distribution (which places…

机器学习 · 统计学 2026-06-26 Shai Ben-David , Farnam Mansouri , Anay Mehrotra , Manolis Zampetakis

Push Puppet Networks: Structured Bayesian Pruning Algorithm for Language Model Compression

This paper presents push puppet networks, a novel Bayesian algorithm for structured pruning of large language models. The push puppet network learns a hierarchical function during training that can optimally determine specific network…

应用统计 · 统计学 2026-06-26 Robert Kubinec

Experimental Design When N Equals One

N-of-1 trials, or time-series experiments, are widely used in clinical research and online platforms. Yet the theoretically optimal design for estimating many treatment effects remains unclear. We propose a simple Markovian framework for…

统计方法学 · 统计学 2026-06-26 Wenxuan Guo , Tengyuan Liang

A joint meta-analysis framework for the accuracy of two diagnostic tests accounting for varying study designs

Meta-analyses of the accuracy of two diagnostic tests typically assume tests are independent conditional on true disease status. This assumption is often unrealistic and violation leads to biased estimates of the accuracy of tests used in…

统计方法学 · 统计学 2026-06-26 Vera Hudak , Nicky J. Welton , Efthymia Derezea , Hayley E. Jones

Bayesian Simultaneous Credible Bands for Polynomial Regression

Quantifying efficacy uncertainty across the entire dose range is crucial in dose-response studies. Although the frequentist simultaneous confidence band (FSCB) is widely used for this purpose, it does not readily incorporate prior…

统计方法学 · 统计学 2026-06-26 Fei Yang , Yang Han , Wei Liu , Ian Hall

A beta-binomial model respecting randomization and its comparison to the standard beta-binomial model that ignores randomization for the meta-analysis of rare events

Background: One of the suggested models for meta-analysis with rare events is the beta-binomial model (BBM). The main advantage of this model compared to inverse-variance models, is that it uses information from zero cells without needing a…

统计方法学 · 统计学 2026-06-26 Tim Mathes , Maxi Schulz , Oliver Kuss

Local Fokker--Planck Geometry for Score Estimation: Heat-Ball Mean-Value Representations and Exact High-Dimensional Sampling

Score-based generative models and Langevin samplers rely on estimating the score function $\nabla_x\log p_t(x)$ of a forward diffusion. Classically this is tractable when the drift is linear: the marginal density is Gaussian and the score…

机器学习 · 统计学 2026-06-26 Jiayao Bai , Lang Deng , Yi Du , Yifei Jia

Robust estimation of occupation probabilities for coarsened multistate processes

We derive augmented inverse probability weighted estimators for occupation probabilities of multistate models under two levels of coarsening; right-censoring and baseline exposure. The key exchangeability assumption for identification is…

统计方法学 · 统计学 2026-06-26 Niklas Nyboe Maltzahn , Gergely Dániel Lukáts , Kjetil Røysland

Robust and Scalable Sure Screening of Fixed effects in Ultrahigh-dimensional Linear Mixed Models

In modern applications of linear mixed models, the number of candidate fixed-effects covariates can grow exponentially with the sample size, while dependence induced by random effects and possible data contamination pose substantial…

统计方法学 · 统计学 2026-06-26 Abhik Ghosh , Magne Thoresen

Adversarial Contamination Meets Hard Thresholding: An Iterative Algorithm with Signal Adaptivity and Minimax Optimality

Pervasive data contamination -- stemming from measurement errors, outliers, or adversarial corruption -- has motivated the development of robust statistical methods. In this context, we propose a two-stage Adversarial…

机器学习 · 统计学 2026-06-26 Shixiang Liu , Hanming Yang

Design-Aware Variance Reduction for Switchback Experiments: A Comparative Study

Switchback experiments and other clustered randomized designs are widely used on online platforms, but the clustered, time-dependent nature of these designs can make standard variance reduction methods behave differently than in standard…

统计方法学 · 统计学 2026-06-26 Sergei Pankratev

Modeling Educational Performance Using School Demographics and Teacher Characteristics

High-dimensional educational datasets often exhibit sparsity, grouped predictors, and locally correlated covariates, limiting the effectiveness of conventional regression methods. We propose an Adaptive Weighted Group Fused LASSO estimator…

统计方法学 · 统计学 2026-06-26 Brianna Reed , Paramahansa Pramanik

Fast Approximate MM-Estimation for Outlier Robust Model Selection

Stratified robust model selection reduces the impact of large residuals and overrepresented outliers in bootstrap samples but is computationally intensive when fitting iteratively-solved robust estimators across many candidate models. We…

统计方法学 · 统计学 2026-06-26 Martin Huang , Samuel Muller , Garth Tarr

Structural Change Detection in Dynamic Systems

Structural changes often arise in real-world dynamic systems due to external interventions or environmental shifts, such as policy changes in epidemiology or climate forcing in environmental science. In this paper, we propose a unified…

统计方法学 · 统计学 2026-06-26 Wei Zhang , Fang Yao

Causal Inference for Functional Treatments with Stochastic Policies

Wearable devices can accurately measure human behavior, providing a unique opportunity to understand how behavior impacts health. Recent studies leveraging functional regression methods have found a strong relationship between…

统计方法学 · 统计学 2026-06-25 Martha Barnard , Jared D. Huling , Julian Wolfson

The Decision Geometry of Covariance Estimation for the Global Minimum-Variance Portfolio under Heavy Tails

The global minimum-variance portfolio (GMVP) is the canonical decision built from an estimated covariance matrix, yet covariance estimators are universally evaluated by matrix-norm loss, which is not the object the decision depends on. We…

机器学习 · 统计学 2026-06-25 Xavier Fonseca

Directed Graph Topology Inference via Graph Filter Identification

We address the problem of inferring a directed network from nodal measurements generated by linear diffusion dynamics on the sought graph. Observations are modeled as the outputs of a graph convolutional filter, i.e., a polynomial (with…

机器学习 · 统计学 2026-06-25 Rasoul Shafipour , Andrei Buciulea , Santiago Segarra , Antonio G. Marques , Gonzalo Mateos

When are likely answers right? On Sequence Probability and Correctness in LLMs

Many decoding methods for large language models can be understood as shifting probability mass toward outputs that are more likely under the model, either locally at the token level or globally at the sequence level. Therefore, their…

机器学习 · 统计学 2026-06-25 Johannes Zenn , Jonas Geiping

Ribbon: Scalable Approximation and Robust Uncertainty Quantification

Reliably quantifying predictive uncertainty is difficult for complex, high-dimensional, or misspecified models. Both fully Bayesian and bootstrap resampling methods provide principled uncertainty estimates but are often too expensive for…

机器学习 · 统计学 2026-06-25 Graham Gibson , John Tipton , Kellin Rumsey , Natalie Klein

Beyond Global Divergences: A Local-Mass Perspective on Bayesian Inference

Global objectives, such as KL divergence and ELBO, are widely used in Bayesian inference for measuring distributional discrepancy. This paper studies their local-mass behaviour that is not directly captured by such objectives. We introduce…

机器学习 · 统计学 2026-06-25 Hanli Xu , Fengxiang He , Sarat Moka