机器学习 — Scifaro

Demographic Parity Tails for Regression

Demographic parity (DP) is a widely studied fairness criterion in regression, enforcing independence between the predictions and sensitive attributes. However, constraining the entire distribution can degrade predictive accuracy and may be…

机器学习 · 统计学 2026-04-03 Naht Sinh Le , Christophe Denis , Mohamed Hebiri

A Novel Theoretical Analysis for Clustering Heteroscedastic Gaussian Data without Knowledge of the Number of Clusters

This paper addresses the problem of clustering measurement vectors that are heteroscedastic in that they can have different covariance matrices. From the assumption that the measurement vectors within a given cluster are Gaussian…

机器学习 · 统计学 2026-04-03 Dominique Pastor , Elsa Dupraz , Ismail Hbilou , Guillaume Ansel

Learning in Prophet Inequalities with Noisy Observations

We study the prophet inequality, a fundamental problem in online decision-making and optimal stopping, in a practical setting where rewards are observed only through noisy realizations and reward distributions are unknown. At each stage,…

机器学习 · 统计学 2026-04-03 Jung-hun Kim , Vianney Perchet

Random Coordinate Descent on the Wasserstein Space of Probability Measures

Optimization over the space of probability measures endowed with the Wasserstein-2 geometry is central to modern machine learning and mean-field modeling. However, traditional methods relying on full Wasserstein gradients often suffer from…

机器学习 · 统计学 2026-04-03 Yewei Xu , Qin Li

Operator Learning for Smoothing and Forecasting

Machine learning has opened new frontiers in purely data-driven algorithms for data assimilation in, and for forecasting of, dynamical systems; the resulting methods are showing some promise. However, in contrast to model-driven algorithms,…

机器学习 · 统计学 2026-04-03 Edoardo Calvello , Elizabeth Carlson , Nikola Kovachki , Michael N. Manta , Andrew M. Stuart

Graph-Informed Adversarial Modeling: Infimal Subadditivity of Interpolative Divergences

We study adversarial learning when the target distribution factorizes according to a known Bayesian network. For interpolative divergences, including $(f,\Gamma)$-divergences, we prove a new infimal subadditivity principle showing that,…

机器学习 · 统计学 2026-04-03 Panagiota Birmpa , Eric Joseph Hall

Contextual Distributionally Robust Optimization with Causal and Continuous Structure: An Interpretable and Tractable Approach

In this paper, we introduce a framework for contextual distributionally robust optimization (DRO) that considers the causal and continuous structure of the underlying distribution by developing interpretable and tractable decision rules…

机器学习 · 统计学 2026-04-03 Fenglin Zhang , Jie Wang

Coarsening Causal DAG Models

Directed acyclic graphical (DAG) models are a powerful tool for representing causal relationships among jointly distributed random variables, especially concerning data from across different experimental settings. However, it is not always…

机器学习 · 统计学 2026-04-03 Francisco Madaleno , Pratik Misra , Alex Markham

Learning with Incomplete Context: Linear Contextual Bandits with Pretrained Imputation

The rise of large-scale pretrained models has made it feasible to generate predictive or synthetic features at low cost, raising the question of how to incorporate such surrogate predictions into downstream decision-making. We study this…

机器学习 · 统计学 2026-04-03 Hao Yan , Heyan Zhang , Yongyi Guo

Adaptive Coverage Policies in Conformal Prediction

Traditional conformal prediction methods construct prediction sets such that the true label falls within the set with a user-specified coverage level. However, poorly chosen coverage levels can result in uninformative predictions, either…

机器学习 · 统计学 2026-04-03 Etienne Gauthier , Francis Bach , Michael I. Jordan

On the Adversarial Robustness of Learning-based Conformal Novelty Detection

This paper studies the adversarial robustness of conformal novelty detection. In particular, we focus on two powerful learning-based frameworks that come with finite-sample false discovery rate (FDR) control: one is AdaDetect (by Marandon…

机器学习 · 统计学 2026-04-03 Daofu Zhang , Mehrdad Pournaderi , Hanne M. Clifford , Yu Xiang , Pramod K. Varshney

Intervening to Learn and Compose Causally Disentangled Representations

In designing generative models, it is commonly believed that in order to learn useful latent structure, we face a fundamental tension between expressivity and structure. In this paper we challenge this view by proposing a new approach to…

机器学习 · 统计学 2026-04-03 Alex Markham , Isaac Hirsch , Jeri A. Chang , Liam Solus , Bryon Aragam

AICO: Feature Significance Tests for Supervised Learning

Machine learning is central to modern science, industry, and policy, yet its predictive power often comes at the cost of transparency: we rarely know which input features truly drive a model's predictions. Without such understanding,…

机器学习 · 统计学 2026-04-03 Kay Giesecke , Enguerrand Horel , Chartsiri Jirachotkulthorn

Prognostics for Autonomous Deep-Space Habitat Health Management under Multiple Unknown Failure Modes

Deep-space habitats (DSHs) are safety-critical systems that must operate autonomously for long periods, often beyond the reach of ground-based maintenance or expert intervention. Monitoring system health and anticipating failures are…

机器学习 · 统计学 2026-04-03 Benjamin Peters , Ayush Mohanty , Xiaolei Fang , Stephen K. Robinson , Nagi Gebraeel

Bridging Structured Knowledge and Data: A Unified Framework with Finance Applications

We develop Structured-Knowledge-Informed Neural Networks (SKINNs), a unified estimation framework that embeds theoretical, simulated, previously learned, or cross-domain insights as differentiable constraints within flexible neural function…

机器学习 · 统计学 2026-04-02 Yi Cao , Zexun Chen , Lin William Cong , Heqing Shi

Deconfounding Scores and Representation Learning for Causal Effect Estimation with Weak Overlap

Overlap, also known as positivity, is a key condition for causal treatment effect estimation. Many popular estimators suffer from high variance and become brittle when features differ strongly across treatment groups. This is especially…

机器学习 · 统计学 2026-04-02 Oscar Clivio , Alexander D'Amour , Alexander Franks , David Bruns-Smith , Chris Holmes , Avi Feller

Inverse-Free Sparse Variational Gaussian Processes

Gaussian processes (GPs) offer appealing properties but are costly to train at scale. Sparse variational GP (SVGP) approximations reduce cost yet still rely on Cholesky decompositions of kernel matrices, ill-suited to low-precision,…

机器学习 · 统计学 2026-04-02 Stefano Cortinovis , Laurence Aitchison , Stefanos Eleftheriadis , Mark van der Wilk

Scenario theory for multi-criteria data-driven decision making

The scenario approach provides a powerful data-driven framework for designing solutions under uncertainty with rigorous probabilistic robustness guarantees. Existing theory, however, primarily addresses assessing robustness with respect to…

机器学习 · 统计学 2026-04-02 Simone Garatti , Lucrezia Manieri , Alessandro Falsone , Algo Carè , Marco C. Campi , Maria Prandini

Denoising distances beyond the volumetric barrier

We study the problem of reconstructing the latent geometry of a $d$-dimensional Riemannian manifold from a random geometric graph. While recent works have made significant progress in manifold recovery from random geometric graphs, and more…

机器学习 · 统计学 2026-04-02 Han Huang , Pakawut Jiradilok , Elchanan Mossel

Breaking Data Symmetry is Needed For Generalization in Feature Learning Kernels

Grokking occurs when a model achieves high training accuracy but generalization to unseen test points happens long after that. This phenomenon was initially observed on a class of algebraic problems, such as learning modular arithmetic…

机器学习 · 统计学 2026-04-02 Marcel Tomàs Bernal , Neil Rohit Mallinar , Mikhail Belkin