机器学习 — Scifaro

Calibeating Prediction-Powered Inference

We study semisupervised mean estimation with a small labeled sample, a large unlabeled sample, and a black-box prediction model whose output may be miscalibrated. A standard approach in this setting is augmented inverse-probability…

机器学习 · 统计学 2026-04-24 Lars van der Laan , Mark Van Der Laan

Refining Covariance Matrix Estimation in Stochastic Gradient Descent Through Bias Reduction

We study online inference and asymptotic covariance estimation for the stochastic gradient descent (SGD) algorithm. While classical methods (such as plug-in and batch-means estimators) are available, they either require inaccessible…

机器学习 · 统计学 2026-04-24 Ziyang Wei , Wanrong Zhu , Jingyang Lyu , Wei Biao Wu

Learning to Emulate Chaos: Adversarial Optimal Transport Regularization

Chaos arises in many complex dynamical systems, from weather to power grids, but is difficult to accurately model using data-driven emulators, including neural operator architectures. For chaotic systems, the inherent sensitivity to initial…

机器学习 · 统计学 2026-04-24 Gabriel Melo , Leonardo Santiago , Peter Y. Lu

Achieving the Kesten-Stigum bound in the non-uniform hypergraph stochastic block model

We study the community detection problem in the non-uniform hypergraph stochastic block model (HSBM), where hyperedges of varying sizes coexist. This setting captures higher-order and multi-view interactions and raises a fundamental…

机器学习 · 统计学 2026-04-24 Manuel Fernandez , Ludovic Stephan , Yizhe Zhu

Decentralized Machine Learning with Centralized Performance Guarantees via Gibbs Algorithms

In this paper, it is shown, for the first time, that centralized performance is achievable in decentralized learning without sharing the local datasets. Specifically, when clients adopt an empirical risk minimization with relative-entropy…

机器学习 · 统计学 2026-04-24 Yaiza Bermudez , Samir M. Perlaza , Iñaki Esnaola

PDGMM-VAE: A Variational Autoencoder with Adaptive Per-Dimension Gaussian Mixture Model Priors for Nonlinear ICA

Independent component analysis is a core framework within blind source separation for recovering latent source signals from observed mixtures under statistical independence assumptions. In this work, we propose PDGMM-VAE, a source-oriented…

机器学习 · 统计学 2026-04-24 Yuan-Hao Wei , Yan-Jie Sun

Spatio-temporal probabilistic forecast using MMAF-guided learning

We present a theory-guided generalized Bayesian methodology for spatio-temporal raster data, which we use to train an ensemble of stochastic feed-forward neural networks with Gaussian-distributed weights. The methodology incorporates the…

机器学习 · 统计学 2026-04-24 Leonardo Bardi , Imma Valentina Curato , Lorenzo Proietti

Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

Despite the remarkable empirical success of score-based diffusion models, their statistical guarantees remain underdeveloped. Existing analyses often provide pessimistic convergence rates that do not reflect the intrinsic low-dimensional…

机器学习 · 统计学 2026-04-24 Saptarshi Chakraborty , Quentin Berthet , Peter L. Bartlett

When Langevin Monte Carlo Meets Randomization: New Sampling Algorithms with Non-asymptotic Error Bounds beyond Log-Concavity and Gradient Lipschitzness

Efficient sampling from complex and high dimensional target distributions turns out to be a fundamental task in diverse disciplines such as scientific computing, statistics and machine learning. In this paper, we propose a new kind of…

机器学习 · 统计学 2026-04-24 Xiaojie Wang , Bin Yang

Variable Selection Using Relative Importance Rankings

Although conceptually related, variable selection and relative importance (RI) analysis have been treated quite differently in the literature. While RI is typically used for post-hoc model explanation, this paper explores its potential for…

机器学习 · 统计学 2026-04-24 Tien-En Chang , Argon Chen

Nonlinear Causal Discovery through a Sequential Edge Orientation Approach

Recent advances have established the identifiability of a directed acyclic graph (DAG) under additive noise models (ANMs), spurring the development of various causal discovery methods. However, most existing methods make restrictive model…

机器学习 · 统计学 2026-04-24 Stella Huang , Qing Zhou

Weighted quantization using MMD: From mean field to mean shift via gradient flows

Approximating a probability distribution using a set of particles is a fundamental problem in machine learning and statistics, with applications including clustering and quantization. Formally, we seek a weighted mixture of Dirac measures…

机器学习 · 统计学 2026-04-24 Ayoub Belhadji , Daniel Sharp , Youssef Marzouk

A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data

This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data. Benchmark experiments are essential in methodological research to scientifically compare new and…

机器学习 · 统计学 2026-04-24 Lukas Burk , John Zobolas , Bernd Bischl , Andreas Bender , Marvin N. Wright , Raphael Sonabend

Is K-fold cross validation the best model selection method for Machine Learning?

As a technique that can compactly represent complex patterns, machine learning has significant potential for predictive inference. K-fold cross-validation (CV) is the most common approach to ascertaining the likelihood that a machine…

机器学习 · 统计学 2026-04-24 Juan M Gorriz , R. Martin Clemente , F Segovia , J Ramirez , A Ortiz , J. Suckling

Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

Sampling from Gibbs distributions and computing their log-partition function are fundamental tasks in statistics, machine learning, and statistical physics. While efficient algorithms are known for log-concave densities, the worst-case…

机器学习 · 统计学 2026-04-24 David Holzmüller , Francis Bach

Geometric Renyi Differential Privacy: Ricci Curvature Characterized by Heat Diffusion Mechanisms

In this paper, we develop a novel privacy mechanism for Riemannian manifold-valued data. Our key contribution lies in uncovering unexpected connections among geometric analysis, heat diffusion models, and differential privacy (DP). We…

机器学习 · 统计学 2026-04-23 Xiaotian Chang , Yangdi Jiang , Cyrus Mostajeran , Qirui Hu

On Bayesian Softmax-Gated Mixture-of-Experts Models

Mixture-of-experts models provide a flexible framework for learning complex probabilistic input-output relationships by combining multiple expert models through an input-dependent gating mechanism. These models have become increasingly…

机器学习 · 统计学 2026-04-23 Nicola Bariletto , Huy Nguyen , Nhat Ho , Alessandro Rinaldo

Efficient Symbolic Computations for Identifying Causal Effects

Determining identifiability of causal effects from observational data under latent confounding is a central challenge in causal inference. For linear structural causal models, identifiability of causal effects is decidable through symbolic…

机器学习 · 统计学 2026-04-23 Benjamin Hollering , Pratik Misra , Nils Sturma

Properties and limitations of geometric tempering for gradient flow dynamics

We consider the problem of sampling from a probability distribution $\pi$. It is well known that this can be written as an optimisation problem over the space of probability distributions in which we aim to minimise the Kullback--Leibler…

机器学习 · 统计学 2026-04-23 Francesca Romana Crucinio , Sahani Pathiraja

Online Survival Analysis: A Bandit Approach under Cox PH Model

Survival analysis is a widely used statistical framework for modeling time-to-event data under censoring. Classical methods, such as the Cox proportional hazards (Cox PH) model, offer a semiparametric approach to estimating the effects of…

机器学习 · 统计学 2026-04-23 Yang Xu , Wenbin Lu , Rui Song