机器学习 — Scifaro

Scalable Bayesian Monte Carlo: fast uncertainty estimation beyond deep ensembles

This work introduces a new method designed for Bayesian deep learning called scalable Bayesian Monte Carlo (SBMC). The method is comprised of a model and an algorithm. The model interpolates between a point estimator and the posterior. The…

机器学习 · 统计学 2025-08-22 Xinzhu Liang , Joseph M. Lukens , Sanjaya Lohani , Brian T. Kirby , Thomas A. Searles , Xin Qiu , Kody J. H. Law

Boundary Detection Algorithm Inspired by Locally Linear Embedding

In the study of high-dimensional data, it is often assumed that the data set possesses an underlying lower-dimensional structure. A practical model for this structure is an embedded compact manifold with boundary. Since the underlying…

机器学习 · 统计学 2025-08-22 Pei-Cheng Kuo , Nan Wu

Neural reproducing kernel Banach spaces and representer theorems for deep networks

Characterizing the function spaces defined by neural networks helps understanding the corresponding learning models and their inductive bias. While in some limits neural networks correspond to function spaces that are Hilbert spaces, these…

机器学习 · 统计学 2025-08-22 Francesca Bartolucci , Ernesto De Vito , Lorenzo Rosasco , Stefano Vigogna

The C-index Multiverse

Quantifying out-of-sample discrimination performance for time-to-event outcomes is a fundamental step for model evaluation and selection in the context of predictive modelling. The concordance index, or C-index, is a widely used metric for…

机器学习 · 统计学 2025-08-21 Begoña B. Sierra , Colin McLean , Peter S. Hall , Catalina A. Vallejos

Comparing Model-agnostic Feature Selection Methods through Relative Efficiency

Feature selection and importance estimation in a model-agnostic setting is an ongoing challenge of significant interest. Wrapper methods are commonly used because they are typically model-agnostic, even though they are computationally…

机器学习 · 统计学 2025-08-21 Chenghui Zheng , Garvesh Raskutti

Learning to Solve Related Linear Systems

Solving multiple parametrised related systems is an essential component of many numerical tasks, and learning from the already solved systems will make this process faster. In this work, we propose a novel probabilistic linear solver over…

机器学习 · 统计学 2025-08-21 Disha Hegde , Jon Cockayne

Towards Understanding Gradient Dynamics of the Sliced-Wasserstein Distance via Critical Point Analysis

In this paper, we investigate the properties of the Sliced Wasserstein Distance (SW) when employed as an objective functional. The SW metric has gained significant interest in the optimal transport and machine learning literature, due to…

机器学习 · 统计学 2025-08-21 Christophe Vauthier , Anna Korba , Quentin Mérigot

Parallelly Tempered Generative Adversarial Nets: Toward Stabilized Gradients

A generative adversarial network (GAN) has been a representative backbone model in generative artificial intelligence (AI) because of its powerful performance in capturing intricate data-generating processes. However, the GAN training is…

机器学习 · 统计学 2025-08-21 Jinwon Sohn , Qifan Song

Comparison of parallel SMC and MCMC for Bayesian deep learning

This work systematically compares parallel implementations of consistent (asymptotically unbiased) Bayesian deep learning algorithms: sequential Monte Carlo sampler (SMC$_\parallel$) or Markov chain Monte Carlo (MCMC$_\parallel$). We…

机器学习 · 统计学 2025-08-21 Xinzhu Liang , Joseph M. Lukens , Sanjaya Lohani , Brian T. Kirby , Thomas A. Searles , Xin Qiu , Kody J. H. Law

Estimation of Structural Causal Model via Sparsely Mixing Independent Component Analysis

We consider the problem of inferring the causal structure from observational data, especially when the structure is sparse. This type of problem is usually formulated as an inference of a directed acyclic graph (DAG) model. The linear…

机器学习 · 统计学 2025-08-21 Kazuharu Harada , Hironori Fujisawa

A PC Algorithm for Max-Linear Bayesian Networks

Max-linear Bayesian networks (MLBNs) are a relatively recent class of structural equation models which arise when the random variables involved have heavy-tailed distributions. Unlike most directed graphical models, MLBNs are typically not…

机器学习 · 统计学 2025-08-20 Carlos Améndola , Benjamin Hollering , Francesco Nowell

Generalisation and benign over-fitting for linear regression onto random functional covariates

We study theoretical predictive performance of ridge and ridge-less least-squares regression when covariate vectors arise from evaluating $p$ random, means-square continuous functions over a latent metric space at $n$ random and unobserved…

机器学习 · 统计学 2025-08-20 Andrew Jones , Nick Whiteley

Online Conformal Selection with Accept-to-Reject Changes

Selecting a subset of promising candidates from a large pool is crucial across various scientific and real-world applications. Conformal selection offers a distribution-free and model-agnostic framework for candidate selection with…

机器学习 · 统计学 2025-08-20 Kangdao Liu , Huajun Xi , Chi-Man Vong , Hongxin Wei

Structural Foundations for Leading Digit Laws: Beyond Probabilistic Mixtures

This article presents a modern deterministic framework for the study of leading significant digit distributions in numerical data. Rather than relying on traditional probabilistic or mixture-based explanations, we demonstrate that the…

机器学习 · 统计学 2025-08-20 Vladimir Berman

Preference Models assume Proportional Hazards of Utilities

Approaches for estimating preferences from human annotated data typically involves inducing a distribution over a ranked list of choices such as the Plackett-Luce model. Indeed, modern AI alignment tools such as Reward Modelling and Direct…

机器学习 · 统计学 2025-08-20 Chirag Nagpal

Rectifying Conformity Scores for Better Conditional Coverage

We present a new method for generating confidence sets within the split conformal prediction framework. Our method performs a trainable transformation of any given conformity score to improve conditional coverage while ensuring exact…

机器学习 · 统计学 2025-08-20 Vincent Plassier , Alexander Fishkov , Victor Dheur , Mohsen Guizani , Souhaib Ben Taieb , Maxim Panov , Eric Moulines

Contrastive Learning on Multimodal Analysis of Electronic Health Records

Electronic health record (EHR) systems contain a wealth of multimodal clinical data including structured data like clinical codes and unstructured data such as clinical notes. However, many existing EHR-focused studies has traditionally…

机器学习 · 统计学 2025-08-20 Tianxi Cai , Feiqing Huang , Ryumei Nakada , Linjun Zhang , Doudou Zhou

Shapley Values: Paired-Sampling Approximations

Originally introduced in cooperative game theory, Shapley values have become a very popular tool to explain machine learning predictions. Based on Shapley's fairness axioms, every input (feature component) gets a credit how it contributes…

机器学习 · 统计学 2025-08-19 Michael Mayer , Mario V. Wüthrich

Simulation-Based Inference: A Practical Guide

A central challenge in many areas of science and engineering is to identify model parameters that are consistent with prior knowledge and empirical data. Bayesian inference offers a principled framework for this task, but can be…

机器学习 · 统计学 2025-08-19 Michael Deistler , Jan Boelts , Peter Steinbach , Guy Moss , Thomas Moreau , Manuel Gloeckler , Pedro L. C. Rodrigues , Julia Linhart , Janne K. Lappalainen , Benjamin Kurt Miller , Pedro J. Gonçalves , Jan-Matthis Lueckmann , Cornelius Schröder , Jakob H. Macke

The path to a goal: Understanding soccer possessions via path signatures

We present a novel framework for predicting next actions in soccer possessions by leveraging path signatures to encode their complex spatio-temporal structure. Unlike existing approaches, we do not rely on fixed historical windows and…

机器学习 · 统计学 2025-08-19 David Hirnschall , Robert Bajons