机器学习 — Scifaro

Kernel K-means clustering of distributional data

We consider the problem of clustering a sample of probability distributions from a random distribution on $\mathbb R^p$. Our proposed partitioning method makes use of a symmetric, positive-definite kernel $k$ and its associated reproducing…

机器学习 · 统计学 2025-09-23 Amparo Baíllo , Jose R. Berrendero , Martín Sánchez-Signorini

Fr\'echet Geodesic Boosting

Gradient boosting has become a cornerstone of machine learning, enabling base learners such as decision trees to achieve exceptional predictive performance. While existing algorithms primarily handle scalar or Euclidean outputs,…

机器学习 · 统计学 2025-09-23 Yidong Zhou , Su I Iao , Hans-Georg Müller

Robust, Online, and Adaptive Decentralized Gaussian Processes

Gaussian processes (GPs) offer a flexible, uncertainty-aware framework for modeling complex signals, but scale cubically with data, assume static targets, and are brittle to outliers, limiting their applicability in large-scale problems…

机器学习 · 统计学 2025-09-23 Fernando Llorente , Daniel Waxman , Sanket Jantre , Nathan M. Urban , Susan E. Minkoff

Robust Mixture Models for Algorithmic Fairness Under Latent Heterogeneity

Standard machine learning models optimized for average performance often fail on minority subgroups and lack robustness to distribution shifts. This challenge worsens when subgroups are latent and affected by complex interactions among…

机器学习 · 统计学 2025-09-23 Siqi Li , Molei Liu , Ziye Tian , Chuan Hong , Nan Liu

Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization

Existing theory suggests that for linear regression problems categorized by capacity and source conditions, gradient descent (GD) is always minimax optimal, while both ridge regression and online stochastic gradient descent (SGD) are…

机器学习 · 统计学 2025-09-23 Jingfeng Wu , Peter L. Bartlett , Jason D. Lee , Sham M. Kakade , Bin Yu

DoubleGen: Debiased Generative Modeling of Counterfactuals

Generative models for counterfactual outcomes face two key sources of bias. Confounding bias arises when approaches fail to account for systematic differences between those who receive the intervention and those who do not. Misspecification…

机器学习 · 统计学 2025-09-23 Alex Luedtke , Kenji Fukumizu

System-Level Uncertainty Quantification with Multiple Machine Learning Models: A Theoretical Framework

ML models have errors when used for predictions. The errors are unknown but can be quantified by model uncertainty. When multiple ML models are trained using the same training points, their model uncertainties may be statistically…

机器学习 · 统计学 2025-09-23 Xiaoping Du

Conditional Multidimensional Scaling with Incomplete Conditioning Data

Conditional multidimensional scaling seeks for a low-dimensional configuration from pairwise dissimilarities, in the presence of other known features. By taking advantage of available data of the known features, conditional multidimensional…

机器学习 · 统计学 2025-09-23 Anh Tuan Bui

Low-Rank Adaptation of Evolutionary Deep Neural Networks for Efficient Learning of Time-Dependent PDEs

We study the Evolutionary Deep Neural Network (EDNN) framework for accelerating numerical solvers of time-dependent partial differential equations (PDEs). We introduce a Low-Rank Evolutionary Deep Neural Network (LR-EDNN), which constrains…

机器学习 · 统计学 2025-09-23 Jiahao Zhang , Shiheng Zhang , Guang Lin

Learning Massive-scale Partial Correlation Networks in Clinical Multi-omics Studies with HP-ACCORD

Graphical model estimation from multi-omics data requires a balance between statistical estimation performance and computational scalability. We introduce a novel pseudolikelihood-based graphical model framework that reparameterizes the…

机器学习 · 统计学 2025-09-23 Sungdong Lee , Joshua Bang , Youngrae Kim , Hyungwon Choi , Sang-Yun Oh , Joong-Ho Won

Validation-Free Sparse Learning: A Phase Transition Approach to Feature Selection

The growing environmental footprint of artificial intelligence (AI), especially in terms of storage and computation, calls for more frugal and interpretable models. Sparse models (e.g., linear, neural networks) offer a promising solution by…

机器学习 · 统计学 2025-09-23 Sylvain Sardy , Maxime van Cutsem , Xiaoyu Ma

Physics-informed kernel learning

Physics-informed machine learning typically integrates physical priors into the learning process by minimizing a loss function that includes both a data-driven term and a partial differential equation (PDE) regularization. Building on the…

机器学习 · 统计学 2025-09-23 Nathan Doumèche , Francis Bach , Gérard Biau , Claire Boyer

A more efficient method for large-sample model-free feature screening via multi-armed bandits

We consider the model-free feature screening in large-scale ultrahigh-dimensional data analysis. Existing feature screening methods often face substantial computational challenges when dealing with large sample sizes. To alleviate the…

机器学习 · 统计学 2025-09-22 Xiaxue Ouyang , Xinlai Kang , Mengyu Li , Zhenxing Dou , Jun Yu , Cheng Meng

What is a good matching of probability measures? A counterfactual lens on transport maps

Coupling probability measures lies at the core of many problems in statistics and machine learning, from domain adaptation to transfer learning and causal inference. Yet, even when restricted to deterministic transports, such couplings are…

机器学习 · 统计学 2025-09-22 Lucas De Lara , Luca Ganassali

Model-free algorithms for fast node clustering in SBM type graphs and application to social role inference in animals

We propose a novel family of model-free algorithms for node clustering and parameter inference in graphs generated from the Stochastic Block Model (SBM), a fundamental framework in community detection. Drawing inspiration from the Lloyd…

机器学习 · 统计学 2025-09-22 Bertrand Cloez , Adrien Cotil , Jean-Baptiste Menassol , Nicolas Verzelen

Interpretable Network-assisted Random Forest+

Machine learning algorithms often assume that training samples are independent. When data points are connected by a network, the induced dependency between samples is both a challenge, reducing effective sample size, and an opportunity to…

机器学习 · 统计学 2025-09-22 Tiffany M. Tang , Elizaveta Levina , Ji Zhu

SETrLUSI: Stochastic Ensemble Multi-Source Transfer Learning Using Statistical Invariant

In transfer learning, a source domain often carries diverse knowledge, and different domains usually emphasize different types of knowledge. Different from handling only a single type of knowledge from all domains in traditional transfer…

机器学习 · 统计学 2025-09-22 Chunna Li , Yiwei Song , Yuanhai Shao

MoCA: Multi-modal Cross-masked Autoencoder for Digital Health Measurements

Wearable devices enable continuous multi-modal physiological and behavioral monitoring, yet analysis of these data streams faces fundamental challenges including the lack of gold-standard labels and incomplete sensor data. While…

机器学习 · 统计学 2025-09-22 Howon Ryu , Yuliang Chen , Yacun Wang , Andrea Z. LaCroix , Chongzhi Di , Loki Natarajan , Yu Wang , Jingjing Zou

Gradient-Free Sequential Bayesian Experimental Design via Interacting Particle Systems

We introduce a gradient-free framework for Bayesian Optimal Experimental Design (BOED) in sequential settings, aimed at complex systems where gradient information is unavailable. Our method combines Ensemble Kalman Inversion (EKI) for…

机器学习 · 统计学 2025-09-22 Robert Gruhlke , Matei Hanu , Claudia Schillings , Philipp Wacker

A Framework for Improving the Reliability of Black-box Variational Inference

Black-box variational inference (BBVI) now sees widespread use in machine learning and statistics as a fast yet flexible alternative to Markov chain Monte Carlo methods for approximate Bayesian inference. However, stochastic optimization…

机器学习 · 统计学 2025-09-22 Manushi Welandawe , Michael Riis Andersen , Aki Vehtari , Jonathan H. Huggins