Statistics — Scifaro

Efficient Benchmarking Is Just Feature Selection and Multiple Regression

Efficient benchmarking techniques aim to lower the computational cost of evaluating LLMs by predicting full benchmark scores using only a subset of a benchmark's questions. By reframing this problem as an instance of multiple regression…

Machine Learning · Statistics 2026-05-26 Sam Bowyer , Acyr Locatelli , Kris Cao

Stein-Encoder: A White-Box Supervised Encoder via Stein Identities in Multi-Modal Studies

In multi-modal biomedical research, integrating high-dimensional genomic data with clinical baselines is essential for precision medicine. However, standard deep neural network approaches often entangle these modalities, obscuring the…

Applications · Statistics 2026-05-26 Jiarui Zhang , Shuoxun Xu , Jiasheng Shi , Xinzhou Guo

StrTransformer: Source-Wise Structured Transformers for Unsupervised Blind Source Recovery

This paper proposes StrTransformer, a source-wise structured Transformer framework for blind source recovery and branch-wise latent modeling. Instead of using an encoder to infer latent variables, StrTransformer directly optimizes the…

Machine Learning · Statistics 2026-05-26 Yuan-Hao Wei

Learning Sparse Compositional Functions with Norm-Constrained Neural Networks

The ability of deep neural networks to learn hierarchical features is widely regarded as a key mechanism underlying their success in high-dimensional learning. Existing theory partially supports this view by establishing approximation rates…

Machine Learning · Statistics 2026-05-26 Shuo Huang , Lorenzo Fiorito , Lorenzo Rosasco , Tomaso Poggio

Optimal Design for Multinomial Logit Model with Applications to Best Assortment Identification

We study optimal experimental design for multinomial logit (MNL) bandits, where an agent repeatedly selects a subset of $K$ items from a ground set of size $N$ and observes single-choice feedback. Unlike linear or generalized linear…

Machine Learning · Statistics 2026-05-26 Joongkyu Lee , Min-hwan Oh

Nonstationary Generalized Linear Bandits with Discounted Online Mirror Descent

We study nonstationary generalized linear bandits (GLBs), where the expected reward is modeled through a nonlinear link function with an unknown time-varying parameter. This framework encompasses a broad class of reward models, including…

Machine Learning · Statistics 2026-05-26 Joongkyu Lee , Min-hwan Oh

From DPPs to $k$-DPPs: identifiability analysis via spectral decomposition

We study the geometry of determinantal point processes (DPPs) through the spectral decomposition $L=U\Lambda U^{\top}$. The spectrum $\Lambda$ governs the cardinality distribution via elementary symmetric polynomials, while the eigenspace…

Machine Learning · Statistics 2026-05-26 Hideitsu Hino , Keisuke Yano

Guided Flow Matching for Forward and Inverse PDE Problems with Sparse Observations: Algorithm and Theory

Reconstructing PDE solutions from sparse observations is a core challenge in scientific computing. We present FM4PDE, a flow-matching generative framework that learns the joint distribution of PDE coefficients (or initial states) and…

Machine Learning · Statistics 2026-05-26 Xifeng Zhang , Jin Zhao

Estimation of Directed Acyclic Graphs by Frequentist Model Averaging

Directed acyclic graphs provide a fundamental tool for representing directed dependence structures in multivariate network data, and are widely used to model financial and economic networks. However, accurate and interpretable estimation…

Methodology · Statistics 2026-05-26 Huihang Liu , Wenhui Li , Xinyu Zhang

Transcripts and Algebraic Distances in Time Series: Stochastic Properties and Nonparametric Dependence Tests

The use of ordinal patterns (OPs) for analyzing the dependence structure of univariate and continuously distributed processes has gained popularity in recent years. This research goes one step further and considers the transcripts being…

Methodology · Statistics 2026-05-26 Christian H. Weiß , José M. Amigó

Mean-Shift PCA by Knockoff Mean

Removing noise is difficult, but adding noise is easy. In this work, we show how to eliminate mean-shift noisy components from PCA by deliberately introducing knockoff mean-shift perturbation. Standard PCA is highly sensitive to shifts in…

Machine Learning · Statistics 2026-05-26 Mengda Li , Zeng Li , Jianfeng Yao

Different Statistical Perspectives for Understanding Generalisation in Graph Neural Networks

Graph Neural Networks (GNN) are currently the most popular approach for learning and prediction on graph-structured data and are deployed in various fields, from social network analysis to drug discovery. However, there is limited…

Methodology · Statistics 2026-05-26 Nil Ayday , Mahalakshmi Sabanayagam , Debarghya Ghoshdastidar

Learning manifold diffusion semigroups from graph transition matrices

We consider graph diffusion processes constructed from finite i.i.d. samples drawn from an unknown manifold embedded in ambient Euclidean space, where the graph affinity is defined by an ambient Gaussian kernel matrix. We show that the…

Machine Learning · Statistics 2026-05-26 Xiuyuan Cheng , Nan Wu

Rank-Based Tests for Mutual Independence of High-Dimensional Random Vectors via $L_q$ Norm

We consider the problem of testing mutual independence among the components of a high-dimensional random vector. Building on the rank-based max-sum framework, we introduce fixed finite-$L_q$ power-sum statistics under three general classes…

Methodology · Statistics 2026-05-26 Ping Zhao , Hongfei Wang , Long Feng

Choosing Online Experiment Designs under Interference in Ads, Recommendations, and Member-Experience Systems

Online experiments in ads, recommendation, and member-experience systems are often planned before the dominant interference mechanism is known. A treatment may propagate through budgets, inventory, producer exposure, graph spillovers, or…

Machine Learning · Statistics 2026-05-26 Prashant Shekhar , Caroline Howard

A Statistical Physics View of the S&P 500: Pairwise Interactions and Time-Varying Dynamics

We analyze a fixed panel of S\&P 500 stocks from 1996 to 2026 using complementary static and kinetic Ising models applied to daily binary open-to-close movements. The static pairwise model provides a long-run maximum-entropy summary of…

Applications · Statistics 2026-05-26 Sebin Oh , Marta C. Gonzáleza , Ziqi Wang

Nystr\"om Kernel Stein Discrepancy Tests

Kernel Stein discrepancy (KSD) is among the most popular goodness-of-fit (GoF) measures on general domains with a large number of successful deployments. One of the main applications of KSD is in constructing powerful GoF tests. However,…

Machine Learning · Statistics 2026-05-26 Florian Kalinke , Zoltán Szabó , Bharath K. Sriperumbudur

Rejoinder: The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review

This article is the rejoinder to ``The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review,'' to appear in the Journal of the American Statistical Association with discussion. To address the practical and…

Applications · Statistics 2026-05-26 Buxin Su , Jiayao Zhang , Natalie Collina , Yuling Yan , Didong Li , Kyunghyun Cho , Jianqing Fan , Aaron Roth , Weijie Su

Counterfactually Safe Reinforcement Learning

Reinforcement learning algorithms are generally designed to maximize the expected return across a population. However, a policy that is optimal on average may be suboptimal for certain individuals, leading to potential safety concerns. To…

Machine Learning · Statistics 2026-05-26 Jingyi Li , Peng Wu , Chengchun Shi

Multimodality Stacking with Blockwise missing values and application to the PIONeeR biomarkers study for prediction of resistance to immunotherapy

Integrating multimodal datasets in clinical oncology is frequently hindered by high dimensionality and blockwise missingness, where entire data sources are unavailable for specific patient subsets. Standard survival models often struggle…

Applications · Statistics 2026-05-26 Mohamed Boussena , Florence Monville , Jacques Fieschi-Meric , Frederic Vely , Pierre Milpied , Julien Mazieres , Maurice Perol , Eric Vivier , Laurent Greillier , Fabrice Barlesi , Sebastien Benzekry