机器学习 — Scifaro

Extrapolation in Statistical Learning with Extreme Value Theory

Extreme value theory provides rigorous theory and statistical tools for extrapolation in machine learning, particularly in settings where traditional methods struggle due to data scarcity in the tails. A broad range of tasks benefit from…

机器学习 · 统计学 2026-05-05 Sebastian Engelke , Nicola Gnecco , Anne Sabourin

Adaptive Estimation and Inference in Semi-parametric Heterogeneous Clustered Multitask Learning via Neyman Orthogonality

We study clustered multitask learning in a semiparametric setting where tasks share a latent cluster structure in their target parameters but exhibit heterogeneous, potentially infinite-dimensional nuisance components. Such heterogeneity…

机器学习 · 统计学 2026-05-05 Hanxiao Chen , Debarghya Mukherjee

Stable Blanket with Hidden Variables and Cycles

Stabilized regression aims to identify a set of predictors whose conditional relationship with a response variable remains invariant across different environments. Existing graphical characterizations of the stable blanket are mainly…

机器学习 · 统计学 2026-05-05 Hanqing Xiang

A Semi-Supervised Kernel Two-Sample Test

We consider the problem of two-sample testing in a semi-supervised setting with abundant unlabeled covariate data. Standard two-sample tests neglect covariate information, which has the potential to significantly boost performance. However,…

机器学习 · 统计学 2026-05-05 Gyumin Lee , Shubhanshu Shekhar , Ilmun Kim

Distributional Causal Mediation via Conditional Generative Modeling

Mediation analysis has traditionally focused on outcome-level summary contrasts, such as mean effects, which may obscure substantial distributional changes induced by complex and nonlinear causal mechanisms. We propose Distributional Causal…

机器学习 · 统计学 2026-05-05 Jinlun Zhang , Haoneng Huang , Zishu Zhan , Chunquan Ou

Missingness-aware Data Imputation via AI-powered Bayesian Generative Modeling

Missing data imputation remains a fundamental challenge in modern data science, especially when uncertainty quantification is essential. In this work, we propose MissBGM, an AI-powered missing data imputation method via Bayesian generative…

机器学习 · 统计学 2026-05-05 Qiao Liu

Self-Normalized Martingales and Uniform Regret Bounds for Linear Regression

Self-normalized martingale inequalities lie at the heart of confidence ellipsoids for online least squares and, more broadly, many bandit and reinforcement-learning results. Yet existing vector and scalar results typically rely on bounded…

机器学习 · 统计学 2026-05-05 Fan Chen , Jian Qian , Alexander Rakhlin , Nikita Zhivotovskiy

Stabilizing Private LASSO under Heterogeneous Covariates via Anisotropic Objective Perturbation

We study high-dimensional LASSO under differential privacy via objective perturbation with heterogeneous covariate scales. In practical scenarios, covariates often exhibit diverse scales; however, standard preprocessing is problematic under…

机器学习 · 统计学 2026-05-05 Haruka Tanzawa , Ayaka Sakata

Mean Testing under Truncation beyond Gaussian

We characterize the fundamental limits of high-dimensional mean testing under arbitrary truncation, where samples are drawn from the conditional distribution $P(\cdot \mid S)$ for an unknown truncation set $S$ that may hide up to an…

机器学习 · 统计学 2026-05-05 Yuhao Wang , Roberto Imbuzeiro Oliveira , Themis Gouleakis

The elbow statistic: Multiscale clustering statistical significance

Selecting the number of clusters remains a fundamental challenge in unsupervised learning. Existing approaches typically focus on identifying a single "optimal" partition, often overlooking statistically meaningful structure present across…

机器学习 · 统计学 2026-05-05 Francisco J. Perez-Reche

Singular Bayesian Neural Networks

Bayesian neural networks promise calibrated uncertainty but require $O(mn)$ parameters for standard mean-field Gaussian posteriors. We argue this cost is often unnecessary, particularly when weight matrices exhibit fast singular value…

机器学习 · 统计学 2026-05-05 Mame Diarra Toure , David A. Stephens

From Mice to Trains: Amortized Bayesian Inference on Graph Data

Graphs arise across diverse domains, from biology and chemistry to social and information networks, as well as in transportation and logistics. Inference on graph-structured data requires methods that are permutation-invariant, scalable…

机器学习 · 统计学 2026-05-05 Svenja Jedhoff , Elizaveta Semenova , Aura Raulo , Anne Meyer , Paul-Christian Bürkner

$\phi$-Table: A Statistical Explanation for Global SHAP

Global SHAP explanations are typically presented as feature-importance rankings, which identify variables that matter to a black-box model but do not indicate whether their effects admit clear directional summaries, how uncertain those…

机器学习 · 统计学 2026-05-05 Dongseok Kim , Hyoungsun Choi , Mohamed Jismy Aashik Rasool , Gisung Oh

Semi-Supervised Treatment Effect Estimation with Unlabeled Covariates for Prediction-Powered Causal Inference

This study investigates treatment effect estimation in the semi-supervised setting, also can be interpreted as prediction-powered inference. In our setting, we can use not only the standard triple of covariates, treatment indicator, and…

机器学习 · 统计学 2026-05-05 Masahiro Kato

Rate-optimal Design for Anytime Best Arm Identification

We consider the best arm identification problem, where the goal is to identify the arm with the highest mean reward from a set of $K$ arms under a limited sampling budget. This problem models many practical scenarios such as A/B testing. We…

机器学习 · 统计学 2026-05-05 Junpei Komiyama , Kyoungseok Jang , Junya Honda

Kernel Treatment Effects with Adaptively Collected Data

Adaptive experiments improve efficiency by adjusting treatment assignments based on past outcomes, but this adaptivity breaks the i.i.d.\ assumptions that underpin classical asymptotics. At the same time, many questions of interest are…

机器学习 · 统计学 2026-05-05 Houssam Zenati , Bariscan Bozkurt , Arthur Gretton

Unfolded Laplacian Spectral Embedding: A Theoretically Grounded Approach to Dynamic Network Representation

Dynamic relational data arise in many machine learning applications, yet their evolving structure poses challenges for learning representations that remain consistent and interpretable over time. A common approach is to learn time varying…

机器学习 · 统计学 2026-05-05 Haruka Ezoe , Hiroki Matsumoto , Ryohei Hisano

A Category-Theoretic Analysis of Conformal Prediction

Conformal prediction (CP) produces prediction regions with finite-sample, distribution free coverage guarantees, but its interpretation as a quantitative uncertainty tool is often left implicit. We develop a category-theoretic approach that…

机器学习 · 统计学 2026-05-05 Michele Caprio

Gaussian Approximation and Multiplier Bootstrap for Stochastic Gradient Descent

In this paper, we establish the non-asymptotic validity of the multiplier bootstrap procedure for constructing the confidence sets using the Stochastic Gradient Descent (SGD) algorithm. Under appropriate regularity conditions, our approach…

机器学习 · 统计学 2026-05-05 Marina Sheshukova , Sergey Samsonov , Denis Belomestny , Eric Moulines , Qi-Man Shao , Zhuo-Song Zhang , Alexey Naumov

General Frameworks for Conditional Two-Sample Testing

We study the problem of conditional two-sample testing, which aims to determine whether two populations have the same distribution after accounting for confounding factors. This problem commonly arises in various applications, such as…

机器学习 · 统计学 2026-05-05 Seongchan Lee , Suman Cha , Ilmun Kim