机器学习 — Scifaro

Physics-informed machine learning: A mathematical framework with applications to time series forecasting

Physics-informed machine learning (PIML) is an emerging framework that integrates physical knowledge into machine learning models. This physical prior often takes the form of a partial differential equation (PDE) system that the regression…

机器学习 · 统计学 2025-07-15 Nathan Doumèche

Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution

We propose a novel variational autoencoder (VAE) architecture that employs a spherical Cauchy (spCauchy) latent distribution. Unlike traditional Gaussian latent spaces or the widely used von Mises-Fisher (vMF) distribution, spCauchy…

机器学习 · 统计学 2025-07-15 Lukas Sablica , Kurt Hornik

Risk Bounds For Distributional Regression

This work examines risk bounds for nonparametric distributional regression estimators. For convex-constrained distributional regression, general upper bounds are established for the continuous ranked probability score (CRPS) and the…

机器学习 · 统计学 2025-07-15 Carlos Misael Madrid Padilla , Oscar Hernan Madrid Padilla , Sabyasachi Chatterjee

Discrimination-free Insurance Pricing with Privatized Sensitive Attributes

Fairness has emerged as a critical consideration in the landscape of machine learning algorithms, particularly as AI continues to transform decision-making across societal domains. To ensure that these algorithms are free from bias and do…

机器学习 · 统计学 2025-07-15 Tianhe Zhang , Suhan Liu , Peng Shi

LITE: Efficiently Estimating Gaussian Probability of Maximality

We consider the problem of computing the probability of maximality (PoM) of a Gaussian random vector, i.e., the probability for each dimension to be maximal. This is a key challenge in applications ranging from Bayesian optimization to…

机器学习 · 统计学 2025-07-15 Nicolas Menet , Jonas Hübotter , Parnian Kassraie , Andreas Krause

A Unified View on Learning Unnormalized Distributions via Noise-Contrastive Estimation

This paper studies a family of estimators based on noise-contrastive estimation (NCE) for learning unnormalized distributions. The main contribution of this work is to provide a unified perspective on various methods for learning…

机器学习 · 统计学 2025-07-15 J. Jon Ryu , Abhin Shah , Gregory W. Wornell

Deep Neural Network Based Accelerated Failure Time Models using Rank Loss

An accelerated failure time (AFT) model assumes a log-linear relationship between failure times and a set of covariates. In contrast to other popular survival models that work on hazard functions, the effects of covariates are directly on…

机器学习 · 统计学 2025-07-15 Gwangsu Kim , Sangwook Kang

Data Depth as a Risk

Data depths are score functions that quantify in an unsupervised fashion how central is a point inside a distribution, with numerous applications such as anomaly detection, multivariate or functional data analysis, arising across various…

机器学习 · 统计学 2025-07-14 Arturo Castellanos , Pavlo Mozharovskyi

Mallows Model with Learned Distance Metrics: Sampling and Maximum Likelihood Estimation

\textit{Mallows model} is a widely-used probabilistic framework for learning from ranking data, with applications ranging from recommendation systems and voting to aligning language models with human preferences~\cite{chen2024mallows,…

机器学习 · 统计学 2025-07-14 Yeganeh Alimohammadi , Kiana Asgari

Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions

Deep neural networks learn structured features from complex, non-Gaussian inputs, but the mechanisms behind this process remain poorly understood. Our work is motivated by the observation that the first-layer filters learnt by deep…

机器学习 · 统计学 2025-07-14 Fabiola Ricci , Lorenzo Bardone , Sebastian Goldt

Communities in the Kuramoto Model: Dynamics and Detection via Path Signatures

The behavior of multivariate dynamical processes is often governed by underlying structural connections that relate the components of the system. For example, brain activity, which is often measured via time series is determined by an…

机器学习 · 统计学 2025-07-14 Tâm Johan Nguyên , Darrick Lee , Bernadette Jana Stolz

Leveraging priors on distribution functions for multi-arm bandits

We introduce Dirichlet Process Posterior Sampling (DPPS), a Bayesian non-parametric algorithm for multi-arm bandits based on Dirichlet Process (DP) priors. Like Thompson-sampling, DPPS is a probability-matching algorithm, i.e., it plays an…

机器学习 · 统计学 2025-07-14 Sumit Vashishtha , Odalric-Ambrym Maillard

Multiaccuracy and Multicalibration via Proxy Groups

As the use of predictive machine learning algorithms increases in high-stakes decision-making, it is imperative that these algorithms are fair across sensitive groups. However, measuring and enforcing fairness in real-world applications can…

机器学习 · 统计学 2025-07-14 Beepul Bharti , Mary Versa Clemens-Sewall , Paul H. Yi , Jeremias Sulam

Overcoming Fairness Trade-offs via Pre-processing: A Causal Perspective

Training machine learning models for fair decisions faces two key challenges: The \emph{fairness-accuracy trade-off} results from enforcing fairness which weakens its predictive performance in contrast to an unconstrained model. The…

机器学习 · 统计学 2025-07-14 Charlotte Leininger , Simon Rittel , Ludwig Bothmann

On the Gaussian process limit of Bayesian Additive Regression Trees

Bayesian Additive Regression Trees (BART) is a nonparametric Bayesian regression technique of rising fame. It is a sum-of-decision-trees model, and is in some sense the Bayesian version of boosting. In the limit of infinite trees, it…

机器学习 · 统计学 2025-07-14 Giacomo Petrillo

Local transfer learning Gaussian process modeling, with applications to surrogate modeling of expensive computer simulators

A critical bottleneck for scientific progress is the costly nature of computer simulations for complex systems. Surrogate models provide an appealing solution: such models are trained on simulator evaluations, then used to emulate and…

机器学习 · 统计学 2025-07-14 Xinming Wang , Simon Mak , John Miller , Jianguo Wu

Local Flow Matching Generative Models

Flow Matching (FM) is a simulation-free method for learning a continuous and invertible flow to interpolate between two distributions, and in particular to generate data from noise. Inspired by the variational nature of the diffusion…

机器学习 · 统计学 2025-07-14 Chen Xu , Xiuyuan Cheng , Yao Xie

Hess-MC2: Sequential Monte Carlo Squared using Hessian Information and Second Order Proposals

When performing Bayesian inference using Sequential Monte Carlo (SMC) methods, two considerations arise: the accuracy of the posterior approximation and computational efficiency. To address computational demands, Sequential Monte Carlo…

机器学习 · 统计学 2025-07-11 Joshua Murphy , Conor Rosato , Andrew Millard , Lee Devlin , Paul Horridge , Simon Maskell

Topological Machine Learning with Unreduced Persistence Diagrams

Supervised machine learning pipelines trained on features derived from persistent homology have been experimentally observed to ignore much of the information contained in a persistence diagram. Computing persistence diagrams is often the…

机器学习 · 统计学 2025-07-11 Nicole Abreu , Parker B. Edwards , Francis Motta

LARP: Learner-Agnostic Robust Data Prefiltering

The widespread availability of large public datasets is a key factor behind the recent successes of statistical inference and machine learning methods. However, these datasets often contain some low-quality or contaminated data, to which…

机器学习 · 统计学 2025-07-11 Kristian Minchev , Dimitar Iliev Dimitrov , Nikola Konstantinov