机器学习 — Scifaro

Multi-Head Attention as Ensemble Nadaraya-Watson Estimation: Variance Reduction, Decorrelation, and Optimal Head Diversity

We develop a rigorous statistical theory of multi-head attention (MHA) as an ensemble of Nadaraya-Watson (NW) kernel regression estimators. Building on the algebraic identity between single-head softmax attention and the NW estimator, we…

机器学习 · 统计学 2026-05-21 Ernest Fokoué

Sliced-Regularized Optimal Transport

We propose a new regularized optimal transport (OT) formulation, termed sliced-regularized optimal transport (SROT). Unlike entropic OT (EOT), which regularizes the transport plan toward an independent coupling, SROT regularizes it toward a…

机器学习 · 统计学 2026-05-21 Khai Nguyen

A theory of learning data statistics in diffusion models, from easy to hard

While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images…

机器学习 · 统计学 2026-05-21 Lorenzo Bardone , Claudia Merger , Sebastian Goldt

Cluster-Based Generalized Additive Models Informed by Random Fourier Features

In developing data-driven modeling methodologies, there is an ongoing need to reconcile the strong predictive performance of opaque black-box models with the transparency required for critical applications. This work introduces an…

机器学习 · 统计学 2026-05-21 Xin Huang , Jia Li , Jun Yu

Maxitive Donsker-Varadhan Formulation for Possibilistic Variational Inference

Variational inference (VI) is a cornerstone of modern Bayesian learning, enabling approximate inference in complex models. However, its formulation depends on expectations and divergences defined through high-dimensional integrals, often…

机器学习 · 统计学 2026-05-21 Jasraj Singh , Shelvia Wongso , Jeremie Houssineau , Badr-Eddine Chérief-Abdellatif

Batched Single-Index Global Multi-Armed Bandits with Covariates

The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making, where a decision-maker selects an arm in each round with the goal of maximizing long-term rewards. In many practical applications, such as…

机器学习 · 统计学 2026-05-21 Sakshi Arya , Hyebin Song

Improved convergence rate of kNN graph Laplacians: differentiable self-tuned affinity

In graph-based data analysis, $k$-nearest neighbor ($k$NN) graphs are widely used due to their adaptivity to local data densities. Allowing weighted edges in the graph, the kernelized graph affinity provides a more general type of $k$NN…

机器学习 · 统计学 2026-05-21 Xiuyuan Cheng , Yixuan Tan , Nan Wu

Computational-Statistical Trade-off in Kernel Two-Sample Testing with Random Fourier Features

Recent years have seen a surge in methods for two-sample testing, among which the Maximum Mean Discrepancy (MMD) test has emerged as an effective tool for handling complex and high-dimensional data. Despite its success and widespread…

机器学习 · 统计学 2026-05-21 Ikjun Choi , Ilmun Kim

Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization

Bayesian optimization (BO) selects evaluation points for expensive black-box objectives using Gaussian process (GP) predictive distributions. Kernel choice and hyperparameter selection can lead to miscalibrated predictive distributions and…

机器学习 · 统计学 2026-05-20 Aurélien Pion , Emmanuel Vazquez

Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately,…

机器学习 · 统计学 2026-05-20 Peter Matthew Jacobs , Jeff M. Phillips

Tail Annealing for Heavy-Tailed Flow Matching

Standard generative models struggle with heavy-tailed data: Lipschitz architectures cannot produce power-law tails from Gaussian noise, and interpolating between heavy-tailed data and Gaussians is ill-posed. We propose a simple fix: apply…

机器学习 · 统计学 2026-05-20 Jean Pachebat

Probabilistic Multivariate Time Series Forecasting with Diffusion Copulas

Accurately assessing financial risk requires capturing both individual asset volatility and the complex, asymmetric dependence structures that emerge during extreme market events. While modern diffusion-based models have advanced…

机器学习 · 统计学 2026-05-20 David Huk , Dongshan Wang , Miha Bresar

Increasing Missingness to Reduce Bias: Richardson-SGD with Missing Data

Stochastic gradient methods are central to modern large-scale learning, but their use with incomplete covariates remains delicate since imputation schemes generally introduce systematic gradient biases, as shown for linear models. In this…

机器学习 · 统计学 2026-05-20 Ferdinand Genans , Erwan Scornet

Gaussian Approximation and Multiplier Bootstrap for Federated Linear Stochastic Approximation

In this paper, we establish Berry-Esseen-type bounds for federated linear stochastic approximation (LSA). Our results provide the first federated Gaussian approximations for LSA that explicitly capture communication-computation trade-offs…

机器学习 · 统计学 2026-05-20 Ilya Levin , Maksim Shuklin , Eric Moulines , Paul Mangold , Sergey Samsonov

Posterior Contraction of L\'evy Adaptive B-spline Regression in Besov Spaces

We investigate the asymptotic properties of the L\'evy Adaptive B-spline (LABS) regression model, a Bayesian nonparametric method that incorporates B-spline kernels into the L\'evy Adaptive Regression Kernel (LARK) model. LABS applies…

机器学习 · 统计学 2026-05-20 Jeunghun Oh , Sewon Park , Jaeyong Lee

Density-Ratio Losses for Post-Hoc Learning to Defer

We study post-hoc Learning to Defer (L2D) through the lens of ideal distributions: divergence-regularized reweightings of the data distribution under which a model attains low loss. We define deferral via the density-ratio between a model's…

机器学习 · 统计学 2026-05-20 Alexander Soen , Ragnar Thobaben , Joakim Jaldén , Richard Nock

Tweedie's Formulae and Diffusion Generative Models Beyond Gaussian

Diffusion models have achieved remarkable success in generating samples from unknown data distributions. Most popular stochastic differential equation-based diffusion models perturb the target distribution by adding Gaussian noise,…

机器学习 · 统计学 2026-05-20 Wenpin Tang , Nizar Touzi , Zikun Zhang , Xun Yu Zhou

A Unified Framework for Structure-Aware Clustering and Heterogeneous Causal Graph Learning

In complex multivariate systems, interactions among variables are defined by dependency structures, often encoded as directed acyclic graphs ($\text{DAGs}$). However, dependency structures can vary across subjects, and ignoring this…

机器学习 · 统计学 2026-05-20 Honglin Du , Muxuan Liang , Xiang Zhong

Factor Augmented High-Dimensional SGD

Stochastic gradient descent (SGD) is a fundamental optimization algorithm widely used in modern machine learning. In this paper, we propose Factor-Augmented SGD (FSGD), a new optimization method that leverages latent factor representations…

机器学习 · 统计学 2026-05-20 Shubo Li , Yuefeng Han , Xiufan Yu

Dual-Channel Tensor Neural Networks: Finite-Sample Theory and Conformal Structure Selection

Tensor-valued data arise naturally in neuroimaging, genomics, climate science, and spatiotemporal networks, where multilinear dependencies across modes carry information that is destroyed under vectorization. Existing approaches either…

机器学习 · 统计学 2026-05-20 Elynn Chen , Jiayu Li , Zheshi Zheng , Jian Pei