机器学习 — Scifaro

Variational Search Distributions

We develop VSD, a method for conditioning a generative model of discrete, combinatorial designs on a rare desired class by efficiently evaluating a black-box (e.g. experiment, simulation) in a batch sequential manner. We call this task…

机器学习 · 统计学 2025-11-24 Daniel M. Steinberg , Rafael Oliveira , Cheng Soon Ong , Edwin V. Bonilla

Interpretable Machine Learning for Survival Analysis

With the spread and rapid advancement of black box machine learning models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is…

机器学习 · 统计学 2025-11-24 Sophie Hanna Langbein , Mateusz Krzyziński , Mikołaj Spytek , Hubert Baniecki , Przemysław Biecek , Marvin N. Wright

Minimax Statistical Estimation under Wasserstein Contamination

Contaminations are a key concern in modern statistical learning, as small but systematic perturbations of all datapoints can substantially alter estimation results. Here, we study Wasserstein-$r$ contaminations ($r\ge 1$) in an $\ell_q$…

机器学习 · 统计学 2025-11-24 Patrick Chao , Edgar Dobriban

Rate-optimal community detection near the KS threshold via node-robust algorithms

We study community detection in the \emph{symmetric $k$-stochastic block model}, where $n$ nodes are evenly partitioned into $k$ clusters with intra- and inter-cluster connection probabilities $p$ and $q$, respectively. Our main result is a…

机器学习 · 统计学 2025-11-21 Jingqiu Ding , Yiding Hua , Kasper Lindberg , David Steurer , Aleksandr Storozhenko

Time dependent loss reweighting for flow matching and diffusion models is theoretically justified

This brief note clarifies that, in Generator Matching (which subsumes a large family of flow matching and diffusion models over continuous, manifold, and discrete spaces), both the Bregman divergence loss and the linear parameterization of…

机器学习 · 统计学 2025-11-21 Lukas Billera , Hedwig Nora Nordlinder , Ben Murrell

Spectral Identifiability for Interpretable Probe Geometry

Linear probes are widely used to interpret and evaluate neural representations, yet their reliability remains unclear, as probes may appear accurate in some regimes but collapse unpredictably in others. We uncover a spectral mechanism…

机器学习 · 统计学 2025-11-21 William Hao-Cheng Huang

Angular Graph Fractional Fourier Transform: Theory and Application

Graph spectral representations are fundamental in graph signal processing, offering a rigorous framework for analyzing and processing graph-structured data. The graph fractional Fourier transform (GFRFT) extends the classical graph Fourier…

机器学习 · 统计学 2025-11-21 Feiyue Zhao , Yangfan He , Zhichao Zhang

Atlas Gaussian processes on restricted domains and point clouds

In real-world applications, data often reside in restricted domains with unknown boundaries, or as high-dimensional point clouds lying on a lower-dimensional, nontrivial, unknown manifold. Traditional Gaussian Processes (GPs) struggle to…

机器学习 · 统计学 2025-11-21 Mu Niu , Yue Zhang , Ke Ye , Pokman Cheung , Yizhu Wang , Xiaochen Yang

Non-Asymptotic Analysis of Data Augmentation for Precision Matrix Estimation

This paper addresses the problem of inverse covariance (also known as precision matrix) estimation in high-dimensional settings. Specifically, we focus on two classes of estimators: linear shrinkage estimators with a target proportional to…

机器学习 · 统计学 2025-11-21 Lucas Morisset , Adrien Hardy , Alain Durmus

To Trust or Not to Trust: On Calibration in ML-based Resource Allocation for Wireless Networks

In next-generation communications and networks, machine learning (ML) models are expected to deliver not only accurate predictions but also well-calibrated confidence scores that reflect the true likelihood of correct decisions. This paper…

机器学习 · 统计学 2025-11-21 Rashika Raina , Nidhi Simmons , David E. Simmons , Michel Daoud Yacoub , Trung Q. Duong

A Distributionally Robust Framework for Nuisance in Causal Effect Estimation

Causal inference requires evaluating models on balanced distributions between treatment and control groups, while training data often exhibits imbalance due to historical decision-making policies. Most conventional statistical methods…

机器学习 · 统计学 2025-11-21 Akira Tanimoto

Enhancing Visual Feature Attribution via Weighted Integrated Gradients

Integrated Gradients (IG) is a widely used attribution method in explainable AI, particularly in computer vision applications where reliable feature attribution is essential. A key limitation of IG is its sensitivity to the choice of…

机器学习 · 统计学 2025-11-21 Kien Tran Duc Tuan , Tam Nguyen Trong , Son Nguyen Hoang , Khoat Than , Anh Nguyen Duc

Fast convergence of the Expectation Maximization algorithm under a logarithmic Sobolev inequality

We present a new framework for analysing the Expectation Maximization (EM) algorithm. Drawing on recent advances in the theory of gradient flows over Euclidean-Wasserstein spaces, we extend techniques from alternating minimization in…

机器学习 · 统计学 2025-11-21 Rocco Caprio , Adam M Johansen

Bipartite Graph Variational Auto-Encoder with Fair Latent Representation to Account for Sampling Bias in Ecological Networks

Citizen science monitoring programs can generate large amounts of valuable data, but are often affected by sampling bias. We focus on a citizen science initiative that records plant-pollinator interactions, with the goal of learning…

机器学习 · 统计学 2025-11-21 Emre Anakok , Pierre Barbillon , Colin Fontaine , Elisa Thebault

R\'enyi Differential Privacy for Heavy-Tailed SDEs via Fractional Poincar\'e Inequalities

Characterizing the differential privacy (DP) of learning algorithms has become a major challenge in recent years. In parallel, many studies suggested investigating the behavior of stochastic gradient descent (SGD) with heavy-tailed noise,…

机器学习 · 统计学 2025-11-20 Benjamin Dupuis , Mert Gürbüzbalaban , Umut Şimşekli , Jian Wang , Sinan Yildirim , Lingjiong Zhu

Near-optimal delta-convex estimation of Lipschitz functions

This paper presents a tractable algorithm for estimating an unknown Lipschitz function from noisy observations and establishes an upper bound on its convergence rate. The approach extends max-affine methods from convex shape-restricted…

机器学习 · 统计学 2025-11-20 Gábor Balázs

A Physics Informed Machine Learning Framework for Optimal Sensor Placement and Parameter Estimation

Parameter estimation remains a challenging task across many areas of engineering. Because data acquisition can often be costly, limited, or prone to inaccuracies (noise, uncertainty) it is crucial to identify sensor configurations that…

机器学习 · 统计学 2025-11-20 Georgios Venianakis , Constantinos Theodoropoulos , Michail Kavousanakis

Gini Score under Ties and Case Weights

The Gini score is a popular tool in statistical modeling and machine learning for model validation and model selection. It is a purely rank based score that allows one to assess risk rankings. The Gini score for statistical modeling has…

机器学习 · 统计学 2025-11-20 Alexej Brauer , Mario V. Wüthrich

Exponential Lasso: robust sparse penalization under heavy-tailed noise and outliers with exponential-type loss

In high-dimensional statistics, the Lasso is a cornerstone method for simultaneous variable selection and parameter estimation. However, its reliance on the squared loss function renders it highly sensitive to outliers and heavy-tailed…

机器学习 · 统计学 2025-11-20 The Tien Mai

Particle Monte Carlo methods for Lattice Field Theory

High-dimensional multimodal sampling problems from lattice field theory (LFT) have become important benchmarks for machine learning assisted sampling methods. We show that GPU-accelerated particle methods, Sequential Monte Carlo (SMC) and…

机器学习 · 统计学 2025-11-20 David Yallup