机器学习 — Scifaro

Understanding Overparametrization in Survival Models through Interpolation

Classical statistical learning theory predicts a U-shaped relationship between test loss and model capacity, driven by the bias-variance trade-off. Recent advances in modern machine learning have revealed a more complex pattern,…

机器学习 · 统计学 2026-04-23 Yin Liu , Jianwen Cai , Didong Li

Control Consistency Losses for Diffusion Bridges

Simulating the conditioned dynamics of diffusion processes, given their initial and terminal states, is an important but challenging problem in the sciences. The difficulty is particularly pronounced for rare events, for which the…

机器学习 · 统计学 2026-04-23 Samuel Howard , Nikolas Nüsken , Jakiw Pidstrigach

Accumulated Aggregated D-Optimal Designs for Estimating Main Effects in Black-Box Models

Estimating how individual input variables affect the output of a black-box model is a central task in explainable machine learning. However, existing methods suffer from two key limitations: sensitivity to out-of-distribution (OOD)…

机器学习 · 统计学 2026-04-23 Chih-Yu Chang , Ming-Chung Chang

Analytical Extraction of Conditional Sobol' Indices via Basis Decomposition of Polynomial Chaos Expansions

In uncertainty quantification, evaluating sensitivity measures under specific conditions (i.e., conditional Sobol' indices) is essential for systems with parameterized responses, such as spatial fields or varying operating conditions.…

机器学习 · 统计学 2026-04-22 Shijie Zhong , Jiangfeng Fu

Fast estimation of Gaussian mixture components via centering and singular value thresholding

Estimating the number of components is a fundamental challenge in unsupervised learning, particularly when dealing with high-dimensional data with many components or severely imbalanced component sizes. This paper addresses this challenge…

机器学习 · 统计学 2026-04-22 Huan Qing

PriorGuide: Test-Time Prior Adaptation for Simulation-Based Inference

Amortized simulator-based inference offers a powerful framework for tackling Bayesian inference in computational fields such as engineering or neuroscience, increasingly leveraging modern generative methods like diffusion models to map…

机器学习 · 统计学 2026-04-22 Yang Yang , Severi Rissanen , Paul E. Chang , Nasrulloh Loka , Daolang Huang , Arno Solin , Markus Heinonen , Luigi Acerbi

Quantifying Data Similarity Using Cross Learning

Measuring dataset similarity is fundamental in machine learning, particularly for transfer learning and domain adaptation. In the context of supervised learning, most existing approaches quantify similarity of two data sets based on their…

机器学习 · 统计学 2026-04-22 Shudong Sun , Hao Helen Zhang , Joseph C Watkins

Efficient Autoregressive Inference for Transformer Probabilistic Models

Set-based transformer models for amortized probabilistic inference and meta-learning, such as neural processes, prior-fitted networks, and tabular foundation models, excel at single-pass marginal prediction. However, many applications…

机器学习 · 统计学 2026-04-22 Conor Hassan , Nasrulloh Loka , Cen-You Li , Daolang Huang , Paul E. Chang , Yang Yang , Francesco Silvestrin , Samuel Kaski , Luigi Acerbi

Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling

Sampling from unnormalized target distributions, e.g.\ Boltzmann distributions $\mu_{\text{target}}(x) \propto \exp(-E(x)/T)$, is fundamental to many scientific applications yet computationally challenging due to complex, high-dimensional…

机器学习 · 统计学 2026-04-22 Niclas Dern , Lennart Redl , Sebastian Pfister , Marcel Kollovieh , David Lüdke , Stephan Günnemann

Highly Efficient and Effective LLMs with Multi-Boolean Architectures

Weight binarization has emerged as a promising strategy to reduce the complexity of large language models (LLMs). Existing approaches fall into post-training binarization, which is simple but causes severe performance loss, and…

机器学习 · 统计学 2026-04-22 Ba-Hien Tran , Van Minh Nguyen

LatticeVision: Image to Image Networks for Modeling Non-Stationary Spatial Data

In many applications, we wish to fit a parametric statistical model to a small ensemble of spatially distributed random variables ('fields'). However, parameter inference using maximum likelihood estimation (MLE) is computationally…

机器学习 · 统计学 2026-04-22 Antony Sikorski , Michael Ivanitskiy , Nathan Lenssen , Douglas Nychka , Daniel McKenzie

A Review of Causal Decision Making

To make effective decisions, it is important to have a thorough understanding of the causal relationships among actions, environments, and outcomes. This review aims to surface three crucial aspects of decision-making through a causal lens:…

机器学习 · 统计学 2026-04-22 Lin Ge , Hengrui Cai , Runzhe Wan , Yang Xu , Rui Song

Batch-Adaptive Causal Annotations

Estimating the causal effects of interventions is crucial to policy and decision-making, yet outcome data are often missing or subject to non-standard measurement error. While ground-truth outcomes can sometimes be obtained through costly…

机器学习 · 统计学 2026-04-22 Ezinne Nwankwo , Lauri Goldkind , Angela Zhou

Revisiting Active Sequential Prediction-Powered Mean Estimation

In this work, we revisit the problem of active sequential prediction-powered mean estimation, where at each round one must decide the query probability of the ground-truth label upon observing the covariates of a sample. Furthermore, if the…

机器学习 · 统计学 2026-04-21 Maria-Eleni Sfyraki , Jun-Kun Wang

FUSE: Ensembling Verifiers with Zero Labeled Data

Verification of model outputs is rapidly emerging as a key primitive for both training and real-world deployment of large language models (LLMs). In practice, this often involves using imperfect LLM judges and reward models since ground…

机器学习 · 统计学 2026-04-21 Joonhyuk Lee , Virginia Ma , Sarah Zhao , Yash Nair , Asher Spector , Regev Cohen , Emmanuel J. Candès

Random Matrix Theory of Early-Stopped Gradient Flow: A Transient BBP Scenario

Empirical studies of trained models often report a transient regime in which signal is detectable in a finite gradient descent time window before overfitting dominates. We provide an analytically tractable random-matrix model that…

机器学习 · 统计学 2026-04-21 Florentin Coeurdoux , Grégoire Ferré , Jean-Philippe Bouchaud

Spectral bandits for smooth graph functions

Smooth functions on graphs have wide applications in manifold and semi-supervised learning. In this paper, we study a bandit problem where the payoffs of arms are smooth on a graph. This framework is suitable for solving online learning…

机器学习 · 统计学 2026-04-21 Michal Valko , Rémi Munos , Branislav Kveton , Tomáš Kocák

Adaptive Kernel Selection for Kernelized Diffusion Maps

Selecting an appropriate kernel is a central challenge in kernel-based spectral methods. In \emph{Kernelized Diffusion Maps} (KDM), the kernel determines the accuracy of the RKHS estimator of a diffusion-type operator and hence the quality…

机器学习 · 统计学 2026-04-21 Othmane Aboussaad , Adam Miraoui , Boumediene Hamzi , Houman Owhadi

Overcoming Selection Bias in Statistical Studies With Amortized Bayesian Inference

Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in…

机器学习 · 统计学 2026-04-21 Jonas Arruda , Sophie Chervet , Paula Staudt , Andreas Wieser , Michael Hoelscher , Isabelle Sermet-Gaudelus , Nadine Binder , Lulla Opatowski , Jan Hasenauer

Symmetry Guarantees Statistic Recovery in Variational Inference

Variational inference (VI) is a central tool in modern machine learning, used to approximate an intractable target density by optimising over a tractable family of distributions. As the variational family cannot typically represent the…

机器学习 · 统计学 2026-04-21 Daniel Marks , Dario Paccagnan , Mark van der Wilk