机器学习 — Scifaro

Mixing Times and Privacy Analysis for the Projected Langevin Algorithm under a Modulus of Continuity

We study the mixing time of the projected Langevin algorithm (LA) and the privacy curve of noisy Stochastic Gradient Descent (SGD), beyond nonexpansive iterations. Specifically, we derive new mixing time bounds for the projected LA which…

机器学习 · 统计学 2026-03-03 Mario Bravo , Juan P. Flores-Mella , Cristóbal Guzmán

Learning sparsity-promoting regularizers for linear inverse problems

This paper introduces a novel approach to learning sparsity-promoting regularizers for solving linear inverse problems. We develop a bilevel optimization framework to select an optimal synthesis operator, denoted as $B$, which regularizes…

机器学习 · 统计学 2026-03-03 Giovanni S. Alberti , Ernesto De Vito , Tapio Helin , Matti Lassas , Luca Ratti , Matteo Santacesaria

LD-EnSF: Synergizing Latent Dynamics with Ensemble Score Filters for Fast Data Assimilation with Sparse Observations

Data assimilation techniques are crucial for accurately tracking complex dynamical systems by integrating observational data with numerical forecasts. Recently, score-based data assimilation methods emerged as powerful tools for…

机器学习 · 统计学 2026-03-03 Pengpeng Xiao , Phillip Si , Peng Chen

Decoupling Dynamical Richness from Representation Learning: Towards Practical Measurement

Dynamic feature transformation (the rich regime) does not always align with predictive performance (better representation), yet accuracy is often used as a proxy for richness, limiting analysis of their relationship. We propose a…

机器学习 · 统计学 2026-03-03 Yoonsoo Nam , Nayara Fonseca , Seok Hyeong Lee , Chris Mingard , Niclas Goring , Ouns El Harzli , Abdurrahman Hadi Erturk , Soufiane Hayou , Ard A. Louis

VICatMix: variational Bayesian clustering and variable selection for discrete biomedical data

Effective clustering of biomedical data is crucial in precision medicine, enabling accurate stratifiction of patients or samples. However, the growth in availability of high-dimensional categorical data, including `omics data, necessitates…

机器学习 · 统计学 2026-03-03 Jackie Rao , Paul D. W. Kirk

Active Bipartite Ranking with Smooth Posterior Distributions

In this article, bipartite ranking, a statistical learning problem involved in many applications and widely studied in the passive context, is approached in a much more general \textit{active setting} than the discrete one previously…

机器学习 · 统计学 2026-03-02 James Cheshire , Stephan Clémençon

A Variational Estimator for $L_p$ Calibration Errors

Calibration$\unicode{x2014}$the problem of ensuring that predicted probabilities align with observed class frequencies$\unicode{x2014}$is a basic desideratum for reliable prediction with machine learning systems. Calibration error is…

机器学习 · 统计学 2026-03-02 Eugène Berta , Sacha Braun , David Holzmüller , Francis Bach , Michael I. Jordan

General Bayesian Policy Learning

This study proposes the General Bayes framework for policy learning. We consider decision problems in which a decision-maker chooses an action from an action set to maximize its expected welfare. Typical examples include treatment choice…

机器学习 · 统计学 2026-03-02 Masahiro Kato

Fairness under Graph Uncertainty: Achieving Interventional Fairness with Partially Known Causal Graphs over Clusters of Variables

Algorithmic decisions about individuals require predictions that are not only accurate but also fair with respect to sensitive attributes such as gender and race. Causal notions of fairness align with legal requirements, yet many methods…

机器学习 · 统计学 2026-03-02 Yoichi Chikahara

Partition Function Estimation under Bounded f-Divergence

We study the statistical complexity of estimating partition functions given sample access to a proposal distribution and an unnormalized density ratio for a target distribution. While partition function estimation is a classical problem,…

机器学习 · 统计学 2026-03-02 Adam Block , Abhishek Shetty

Uncovering Physical Drivers of Dark Matter Halo Structures with Auxiliary-Variable-Guided Generative Models

Deep generative models (DGMs) compress high-dimensional data but often entangle distinct physical factors in their latent spaces. We present an auxiliary-variable-guided framework for disentangling representations of thermal…

机器学习 · 统计学 2026-03-02 Arkaprabha Ganguli , Anirban Samaddar , Florian Kéruzoré , Nesar Ramachandra , Julie Bessac , Sandeep Madireddy , Emil Constantinescu

Implicit Bias and Convergence of Matrix Stochastic Mirror Descent

We investigate Stochastic Mirror Descent (SMD) with matrix parameters and vector-valued predictions, a framework relevant to multi-class classification and matrix completion problems. Focusing on the overparameterized regime, where the…

机器学习 · 统计学 2026-03-02 Danil Akhtiamov , Reza Ghane , Omead Pooladzandi , Babak Hassibi

An operator splitting analysis of Wasserstein--Fisher--Rao gradient flows

Wasserstein-Fisher-Rao (WFR) gradient flows have been recently proposed as a powerful sampling tool that combines the advantages of pure Wasserstein (W) and pure Fisher-Rao (FR) gradient flows. Existing algorithmic developments implicitly…

机器学习 · 统计学 2026-03-02 Francesca Romana Crucinio , Sahani Pathiraja

Embracing Discrete Search: A Reasonable Approach to Causal Structure Learning

We present FLOP (Fast Learning of Order and Parents), a score-based causal discovery algorithm for linear models. It pairs fast parent selection with iterative Cholesky-based score updates, cutting run-times over prior algorithms. This…

机器学习 · 统计学 2026-03-02 Marcel Wienöbst , Leonard Henckel , Sebastian Weichwald

Estimating Treatment Effects with Independent Component Analysis

Independent Component Analysis (ICA) uses a measure of non-Gaussianity to identify latent sources from data and estimate their mixing coefficients (Shimizu et al., 2006). Meanwhile, higher-order Orthogonal Machine Learning (OML) exploits…

机器学习 · 统计学 2026-03-02 Patrik Reizinger , Lester Mackey , Wieland Brendel , Rahul Krishnan

Conformal Prediction for Long-Tailed Classification

Many real-world classification problems, such as plant identification, have extremely long-tailed class distributions. In order for prediction sets to be useful in such settings, they should (i) provide good class-conditional coverage,…

机器学习 · 统计学 2026-03-02 Tiffany Ding , Jean-Baptiste Fermanian , Joseph Salmon

Kernel spectral joint embeddings for high-dimensional noisy datasets using duo-landmark integral operators

Integrative analysis of multiple heterogeneous datasets has become standard practice in many research fields, especially in single-cell genomics and medical informatics. Existing approaches oftentimes suffer from limited power in capturing…

机器学习 · 统计学 2026-03-02 Xiucai Ding , Rong Ma

Assessment of Spatio-Temporal Predictors in the Presence of Missing and Heterogeneous Data

Deep learning methods achieve remarkable predictive performance in modeling complex, large-scale data. However, assessing the quality of derived models has become increasingly challenging, as more classical statistical assumptions may no…

机器学习 · 统计学 2026-03-02 Daniele Zambon , Cesare Alippi

Regular Fourier Features for Nonstationary Gaussian Processes

Simulating a Gaussian process requires sampling from a high-dimensional Gaussian distribution, which scales cubically with the number of sample locations. Spectral methods address this challenge by exploiting the Fourier representation,…

机器学习 · 统计学 2026-02-27 Arsalan Jawaid , Abdullah Karatas , Jörg Seewig

Kernel Integrated $R^2$: A Measure of Dependence

We introduce kernel integrated $R^2$, a new measure of statistical dependence that combines the local normalization principle of the recently introduced integrated $R^2$ with the flexibility of reproducing kernel Hilbert spaces (RKHSs). The…

机器学习 · 统计学 2026-02-27 Pouya Roudaki , Shakeel Gavioli-Akilagun , Florian Kalinke , Mona Azadkia , Zoltán Szabó