机器学习 — Scifaro

On the performance of multi-fidelity and reduced-dimensional neural emulators for inference of physiological boundary conditions

Solving inverse problems in cardiovascular modeling is particularly challenging due to the high computational cost of running high-fidelity simulations. In this work, we focus on Bayesian parameter estimation and explore different methods…

机器学习 · 统计学 2025-12-22 Chloe H. Choi , Andrea Zanoni , Daniele E. Schiavazzi , Alison L. Marsden

Clustering and Pruning in Causal Data Fusion

Data fusion, the process of combining observational and experimental data, can enable the identification of causal effects that would otherwise remain non-identifiable. Although identification algorithms have been developed for specific…

机器学习 · 统计学 2025-12-22 Otto Tabell , Santtu Tikka , Juha Karvanen

The Stochastic Occupation Kernel (SOCK) Method for Learning Stochastic Differential Equations

We present a novel kernel-based method for learning multivariate stochastic differential equations (SDEs). The method follows a two-step procedure: we first estimate the drift term function, then the (matrix-valued) diffusion function given…

机器学习 · 统计学 2025-12-22 Michael L. Wells , Kamel Lahouel , Bruno Jedynak

Constraint-based causal discovery with tiered background knowledge and latent variables in single or overlapping datasets

In this paper we consider the use of tiered background knowledge within constraint based causal discovery. Our focus is on settings relaxing causal sufficiency, i.e. allowing for latent variables which may arise because relevant information…

机器学习 · 统计学 2025-12-22 Christine W. Bang , Vanessa Didelez

Refined Analysis of Federated Averaging and Federated Richardson-Romberg

In this paper, we present a novel analysis of \FedAvg with constant step size, relying on the Markov property of the underlying process. We demonstrate that the global iterates of the algorithm converge to a stationary distribution and…

机器学习 · 统计学 2025-12-22 Paul Mangold , Alain Durmus , Aymeric Dieuleveut , Sergey Samsonov , Eric Moulines

Targeted Learning for Variable Importance

Variable importance is one of the most widely used measures for interpreting machine learning with significant interest from both statistics and machine learning communities. Recently, increasing attention has been directed toward…

机器学习 · 统计学 2025-12-22 Xiaohan Wang , Yunzhe Zhou , Giles Hooker

Adjusting Model Size in Continual Gaussian Processes: How Big is Big Enough?

Many machine learning models require setting a parameter that controls their size before training, e.g. number of neurons in DNNs, or inducing points in GPs. Increasing capacity typically improves performance until all the information from…

机器学习 · 统计学 2025-12-22 Guiomar Pescador-Barrios , Sarah Filippi , Mark van der Wilk

SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning

In this paper, we analyze the sample and communication complexity of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the effects of local training with agent heterogeneity. We show that the…

机器学习 · 统计学 2025-12-22 Paul Mangold , Sergey Samsonov , Safwan Labbi , Ilya Levin , Reda Alami , Alexey Naumov , Eric Moulines

Differentially private Bayesian tests

Differential privacy has emerged as an significant cornerstone in the realm of scientific hypothesis testing utilizing confidential data. In reporting scientific discoveries, Bayesian tests are widely adopted since they effectively…

机器学习 · 统计学 2025-12-22 Abhisek Chakraborty , Saptati Datta

Riemannian Stochastic Interpolants for Amorphous Particle Systems

Modern generative models hold great promise for accelerating diverse tasks involving the simulation of physical systems, but they must be adapted to the specific constraints of each domain. Significant progress has been made for…

机器学习 · 统计学 2025-12-19 Louis Grenioux , Leonardo Galliano , Ludovic Berthier , Giulio Biroli , Marylou Gabrié

Advantages and limitations in the use of transfer learning for individual treatment effects in causal machine learning

Generalizing causal knowledge across diverse environments is challenging, especially when estimates from large-scale datasets must be applied to smaller or systematically different contexts, where external validity is critical. Model-based…

机器学习 · 统计学 2025-12-19 Seyda Betul Aydin , Holger Brandt

DAG Learning from Zero-Inflated Count Data Using Continuous Optimization

We address network structure learning from zero-inflated count data by casting each node as a zero-inflated generalized linear model and optimizing a smooth, score-based objective under a directed acyclic graph constraint. Our Zero-Inflated…

机器学习 · 统计学 2025-12-19 Noriaki Sato , Marco Scutari , Shuichi Kawano , Rui Yamaguchi , Seiya Imoto

BayesSum: Bayesian Quadrature in Discrete Spaces

This paper addresses the challenging computational problem of estimating intractable expectations over discrete domains. Existing approaches, including Monte Carlo and Russian Roulette estimators, are consistent but often require a large…

机器学习 · 统计学 2025-12-19 Sophia Seulkee Kang , François-Xavier Briol , Toni Karvonen , Zonghao Chen

Bayesian Deep Learning for Discrete Choice

Discrete choice models (DCMs) are used to analyze individual decision-making in contexts such as transportation choices, political elections, and consumer preferences. DCMs play a central role in applied econometrics by enabling inference…

机器学习 · 统计学 2025-12-19 Daniel F. Villarraga , Ricardo A. Daziano

An interpretation of the Brownian bridge as a physics-informed prior for the Poisson equation

Many inverse problems require reconstructing physical fields from limited and noisy data while incorporating known governing equations. A growing body of work within probabilistic numerics formalizes such tasks via Bayesian inference in…

机器学习 · 统计学 2025-12-19 Alex Alberts , Ilias Bilionis

Nested subspace learning with flags

Many machine learning methods look for low-dimensional representations of the data. The underlying subspace can be estimated by first choosing a dimension $q$ and then optimizing a certain objective function over the space of…

机器学习 · 统计学 2025-12-19 Tom Szwagier , Xavier Pennec

Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods

Discrete diffusion models have recently gained significant attention due to their ability to process complex discrete structures for language modeling. However, fine-tuning these models with policy gradient methods, as is commonly done in…

机器学习 · 统计学 2025-12-19 Oussama Zekri , Nicolas Boullé

High-Dimensional Partial Least Squares: Spectral Analysis and Fundamental Limitations

Partial Least Squares (PLS) is a widely used method for data integration, designed to extract latent components shared across paired high-dimensional datasets. Despite decades of practical success, a precise theoretical understanding of its…

机器学习 · 统计学 2025-12-18 Victor Léger , Florent Chatelain

A Teacher-Student Perspective on the Dynamics of Learning Near the Optimal Point

Near an optimal learning point of a neural network, the learning performance of gradient descent dynamics is dictated by the Hessian matrix of the loss function with respect to the network parameters. We characterize the Hessian…

机器学习 · 统计学 2025-12-18 Carlos Couto , José Mourão , Mário A. T. Figueiredo , Pedro Ribeiro

Few-Shot Multimodal Medical Imaging: A Theoretical Framework

Medical imaging often operates under limited labeled data, especially in rare disease and low resource clinical environments. Existing multimodal and meta learning approaches improve performance in these settings but lack a theoretical…

机器学习 · 统计学 2025-12-18 Md Talha Mohsin , Ismail Abdulrashid