机器学习 — Scifaro

SRRM: Improving Recursive Transport Surrogates in the Small-Discrepancy Regime

Recursive partitioning methods provide computationally efficient surrogates for the Wasserstein distance, yet their statistical behavior and their resolution in the small-discrepancy regime remain insufficiently understood. We study…

机器学习 · 统计学 2026-03-20 Yufei Zhang , Tao Wang , Jingyi Zhang

Precise Performance of Linear Denoisers in the Proportional Regime

In the present paper we study the performance of linear denoisers for noisy data of the form $\mathbf{x} + \mathbf{z}$, where $\mathbf{x} \in \mathbb{R}^d$ is the desired data with zero mean and unknown covariance $\mathbf{\Sigma}$, and…

机器学习 · 统计学 2026-03-20 Reza Ghane , Danil Akhtiamov , Babak Hassibi

A Hybrid Conditional Diffusion-DeepONet Framework for High-Fidelity Stress Prediction in Hyperelastic Materials

Predicting stress fields in hyperelastic materials with complex microstructures remains challenging for traditional deep learning surrogates, which struggle to capture both sharp stress concentrations and the wide dynamic range of stress…

机器学习 · 统计学 2026-03-20 Purna Vindhya Kota , Meer Mehran Rashid , Somdatta Goswami , Lori Graham-Brady

Starting Off on the Wrong Foot: Pitfalls in Data Preparation

When working with real-world insurance data, practitioners often encounter challenges during the data preparation stage that can undermine the statistical validity and reliability of downstream modeling. This study illustrates that…

机器学习 · 统计学 2026-03-20 Jiayi Guo , Panyi Dong , Zhiyu Quan

A Structured Nonparametric Framework for Nonlinear Accelerated Failure Time Models (KAN-AFT)

Accelerated failure time (AFT) models provide a direct and interpretable time-scale description of covariate effects in lifetime data analysis, but classical formulations rely on linear predictors and are therefore limited in their ability…

机器学习 · 统计学 2026-03-20 Mebin Jose , Jisha Francis , Sudheesh Kumar Kattumannil

Resonances in reflective Hamiltonian Monte Carlo

In high dimensions, reflective Hamiltonian Monte Carlo with inexact reflections exhibits slow mixing when the particle ensemble is initialised from a Dirac delta distribution and the uniform distribution is targeted. By quantifying the…

机器学习 · 统计学 2026-03-20 Namu Kroupa , Gábor Csányi , Will Handley

Multifidelity Simulation-based Inference for Computationally Expensive Simulators

Across many domains of science, stochastic models are an essential tool to understand the mechanisms underlying empirically observed data. Models can be of different levels of detail and accuracy, with models of high-fidelity (i.e., high…

机器学习 · 统计学 2026-03-20 Anastasia N. Krouglova , Hayden R. Johnson , Basile Confavreux , Michael Deistler , Pedro J. Gonçalves

Assessing the Distributional Fidelity of Synthetic Chest X-rays using the Embedded Characteristic Score

Chest X-ray (CXR) images are among the most commonly used diagnostic imaging modalities in clinical practice. Stringent privacy constraints often limit the public dissemination of patient CXR images, contributing to the increasing use of…

机器学习 · 统计学 2026-03-20 Edric Tam , Barbara E Engelhardt

Combining T-learning and DR-learning: a framework for oracle-efficient estimation of causal contrasts

We introduce efficient plug-in (EP) learning, a novel framework for the estimation of heterogeneous causal contrasts, such as the conditional average treatment effect and conditional relative risk. The EP-learning framework enjoys the same…

机器学习 · 统计学 2026-03-20 Lars van der Laan , Marco Carone , Alex Luedtke

Hidden yet quantifiable: A lower bound for confounding strength using randomized trials

In the era of fast-paced precision medicine, observational studies play a major role in properly evaluating new treatments in clinical practice. Yet, unobserved confounding can significantly compromise causal conclusions drawn from…

机器学习 · 统计学 2026-03-20 Piersilvio De Bartolomeis , Javier Abad , Konstantin Donhauser , Fanny Yang

A Noise Sensitivity Exponent Controls Large Statistical-to-Computational Gaps in Single- and Multi-Index Models

Understanding when learning is statistically possible yet computationally hard is a central challenge in high-dimensional statistics. In this work, we investigate this question in the context of single- and multi-index models, classes of…

机器学习 · 统计学 2026-03-19 Leonardo Defilippis , Florent Krzakala , Bruno Loureiro , Antoine Maillard

rSDNet: Unified Robust Neural Learning against Label Noise and Adversarial Attacks

Neural networks are central to modern artificial intelligence, yet their training remains highly sensitive to data contamination. Standard neural classifiers are trained by minimizing the categorical cross-entropy loss, corresponding to…

机器学习 · 统计学 2026-03-19 Suryasis Jana , Abhik Ghosh

Gaussian Process Limit Reveals Structural Benefits of Graph Transformers

Graph transformers are the state-of-the-art for learning from graph-structured data and are empirically known to avoid several pitfalls of message-passing architectures. However, there is limited theoretical analysis on why these models…

机器学习 · 统计学 2026-03-19 Nil Ayday , Lingchu Yang , Debarghya Ghoshdastidar

Consistency of the $k$-Nearest Neighbor Regressor under Complex Survey Designs

We study the consistency of the $k$-nearest neighbor regressor under complex survey designs. While consistency results for this algorithm are well established for independent and identically distributed data, corresponding results for…

机器学习 · 统计学 2026-03-19 Caren Hasler

Mirror Descent on Riemannian Manifolds

Mirror Descent (MD) is a scalable first-order method widely used in large-scale optimization, with applications in image processing, policy optimization, and neural network training. This paper generalizes MD to optimization on Riemannian…

机器学习 · 统计学 2026-03-19 Jiaxin Jiang , Lei Shi , Jiyuan Tan

Self-Regularized Learning Methods

We introduce a general framework for analyzing learning algorithms based on the notion of self-regularization, which captures implicit complexity control without requiring explicit regularization. This is motivated by previous observations…

机器学习 · 统计学 2026-03-19 Max Schölpple , Liu Fanghui , Ingo Steinwart

Kriging via variably scaled kernels

Classical Gaussian processes and Kriging models are commonly based on stationary kernels, whereby correlations between observations depend exclusively on the relative distance between scattered data. While this assumption ensures analytical…

机器学习 · 统计学 2026-03-19 Gianluca Audone , Francesco Marchetti , Emma Perracchione , Milvia Rossini

AR-Flow VAE: A Structured Autoregressive Flow Prior Variational Autoencoder for Unsupervised Blind Source Separation

Blind source separation (BSS) seeks to recover latent source signals from observed mixtures. Variational autoencoders (VAEs) offer a natural perspective for this problem: the latent variables can be interpreted as source components, the…

机器学习 · 统计学 2026-03-19 Yuan-Hao Wei , Fu-Hao Deng , Lin-Yong Cui , Yan-Jie Sun

Dual Space Preconditioning for Gradient Descent in the Overparameterized Regime

In this work we study the convergence properties of the Dual Space Preconditioned Gradient Descent, encompassing optimizers such as Normalized Gradient Descent, Gradient Clipping and Adam. We consider preconditioners of the form $\nabla K$,…

机器学习 · 统计学 2026-03-19 Reza Ghane , Danil Akhtiamov , Babak Hassibi

Exact Generalisation Error Exposes Benchmarks Skew Graph Neural Networks Success (or Failure)

Graph Neural Networks (GNNs) have become the standard method for learning from networks across fields ranging from biology to social systems, yet a principled understanding of what enables them to extract meaningful representations, or why…

机器学习 · 统计学 2026-03-19 Nil Ayday , Mahalakshmi Sabanayagam , Debarghya Ghoshdastidar