机器学习 — Scifaro

Complexity Dependent Error Rates for Physics-informed Statistical Learning via the Small-ball Method

Physics-informed statistical learning (PISL) integrates empirical data with physical knowledge to enhance the statistical performance of estimators. While PISL methods are widely used in practice, a comprehensive theoretical understanding…

机器学习 · 统计学 2025-10-28 Diego Marcondes

Coupled Flow Matching

We introduce Coupled Flow Matching (CPFM), a framework that integrates controllable dimensionality reduction and high-fidelity reconstruction. CPFM learns coupled continuous flows for both the high-dimensional data x and the low-dimensional…

机器学习 · 统计学 2025-10-28 Wenxi Cai , Yuheng Wang , Naichen Shi

OEUVRE: OnlinE Unbiased Variance-Reduced loss Estimation

Online learning algorithms continually update their models as data arrive, making it essential to accurately estimate the expected loss at the current time step. The prequential method is an effective estimation approach which can be…

机器学习 · 统计学 2025-10-28 Kanad Pardeshi , Bryan Wilder , Aarti Singh

Block Coordinate Descent for Neural Networks Provably Finds Global Minima

In this paper, we consider a block coordinate descent (BCD) algorithm for training deep neural networks and provide a new global convergence guarantee under strictly monotonically increasing activation functions. While existing works…

机器学习 · 统计学 2025-10-28 Shunta Akiyama

Semi-Supervised Learning under General Causal Models

Semi-supervised learning (SSL) aims to train a machine learning model using both labelled and unlabelled data. While the unlabelled data have been used in various ways to improve the prediction accuracy, the reason why unlabelled data could…

机器学习 · 统计学 2025-10-28 Archer Moore , Heejung Shim , Jingge Zhu , Mingming Gong

Statistical Analysis of the Sinkhorn Iterations for Two-Sample Schr\"odinger Bridge Estimation

The Schr\"odinger bridge problem seeks the optimal stochastic process that connects two given probability distributions with minimal energy modification. While the Sinkhorn algorithm is widely used to solve the static optimal transport…

机器学习 · 统计学 2025-10-28 Ibuki Maeda , Rentian Yao , Atsushi Nitanda

MetaCaDI: A Meta-Learning Framework for Scalable Causal Discovery with Unknown Interventions

Uncovering the underlying causal mechanisms of complex real-world systems remains a significant challenge, as these systems often entail high data collection costs and involve unknown interventions. We introduce MetaCaDI, the first…

机器学习 · 统计学 2025-10-28 Hans Jarett Ong , Yoichi Chikahara , Tomoharu Iwata

Differentially Private High-dimensional Variable Selection via Integer Programming

Sparse variable selection improves interpretability and generalization in high-dimensional learning by selecting a small subset of informative features. Recent advances in Mixed Integer Programming (MIP) have enabled solving large-scale…

机器学习 · 统计学 2025-10-28 Petros Prastakos , Kayhan Behdin , Rahul Mazumder

Input Adaptive Bayesian Model Averaging

This paper studies prediction with multiple candidate models, where the goal is to combine their outputs. This task is especially challenging in heterogeneous settings, where different models may be better suited to different inputs. We…

机器学习 · 统计学 2025-10-28 Yuli Slavutsky , Sebastian Salazar , David M. Blei

Bridging Prediction and Attribution: Identifying Forward and Backward Causal Influence Ranges Using Assimilative Causal Inference

Causal inference identifies cause-and-effect relationships between variables. While traditional approaches rely on data to reveal causal links, a recently developed method, assimilative causal inference (ACI), integrates observations with…

机器学习 · 统计学 2025-10-28 Marios Andreou , Nan Chen

Enforcing Calibration in Multi-Output Probabilistic Regression with Pre-rank Regularization

Probabilistic models must be well calibrated to support reliable decision-making. While calibration in single-output regression is well studied, defining and achieving multivariate calibration in multi-output regression remains considerably…

机器学习 · 统计学 2025-10-28 Naomi Desobry , Elnura Zhalieva , Souhaib Ben Taieb

Neural variational inference for cutting feedback during uncertainty propagation

In many scientific applications, uncertainty of estimates from an earlier (upstream) analysis needs to be propagated in subsequent (downstream) Bayesian analysis, without feedback. Cutting feedback methods, also termed cut-Bayes, achieve…

机器学习 · 统计学 2025-10-28 Jiafang Song , Sandipan Pramanik , Abhirup Datta

Variational Learning Finds Flatter Solutions at the Edge of Stability

Variational Learning (VL) has recently gained popularity for training deep neural networks. Part of its empirical success can be explained by theories such as PAC-Bayes bounds, minimum description length and marginal likelihood, but little…

机器学习 · 统计学 2025-10-28 Avrajit Ghosh , Bai Cong , Rio Yokota , Saiprasad Ravishankar , Rongrong Wang , Molei Tao , Mohammad Emtiyaz Khan , Thomas Möllenhoff

Reconstruction and Prediction of Volterra Integral Equations Driven by Gaussian Noise

Integral equations are widely used in fields such as applied modeling, medical imaging, and system identification, providing a powerful framework for solving deterministic problems. While parameter identification for differential equations…

机器学习 · 统计学 2025-10-28 Zhihao Xu , Saisai Ding , Zhikun Zhang , Xiangjun Wang

Deep Copula Classifier: Theory, Consistency, and Empirical Evaluation

We present the Deep Copula Classifier (DCC), a class-conditional generative model that separates marginal estimation from dependence modeling using neural copula densities. DCC is interpretable, Bayes-consistent, and achieves excess-risk…

机器学习 · 统计学 2025-10-28 Agnideep Aich , Ashit Baran Aich

Statistical Inference under Performativity

Performativity of predictions refers to the phenomenon where prediction-informed decisions influence the very targets they aim to predict -- a dynamic commonly observed in policy-making, social sciences, and economics. In this paper, we…

机器学习 · 统计学 2025-10-28 Xiang Li , Yunai Li , Huiying Zhong , Lihua Lei , Zhun Deng

DataRater: Meta-Learned Dataset Curation

The quality of foundation models depends heavily on their training data. Consequently, great efforts have been put into dataset curation. Yet most approaches rely on manual tuning of coarse-grained mixtures of large buckets of data, or…

机器学习 · 统计学 2025-10-28 Dan A. Calian , Gregory Farquhar , Iurii Kemaev , Luisa M. Zintgraf , Matteo Hessel , Jeremy Shar , Junhyuk Oh , András György , Tom Schaul , Jeffrey Dean , Hado van Hasselt , David Silver

A conversion theorem and minimax optimality for continuum contextual bandits

We study the contextual continuum bandits problem, where the learner sequentially receives a side information vector and has to choose an action in a convex set, minimizing a function associated with the context. The goal is to minimize all…

机器学习 · 统计学 2025-10-28 Arya Akhavan , Karim Lounici , Massimiliano Pontil , Alexandre B. Tsybakov

Fisher meets Feynman: score-based variational inference with a product of experts

We introduce a highly expressive yet distinctly tractable family for black-box variational inference (BBVI). Each member of this family is a weighted product of experts (PoE), and each weighted expert in the product is proportional to a…

机器学习 · 统计学 2025-10-27 Diana Cai , Robert M. Gower , David M. Blei , Lawrence K. Saul

HollowFlow: Efficient Sample Likelihood Evaluation using Hollow Message Passing

Flow and diffusion-based models have emerged as powerful tools for scientific applications, particularly for sampling non-normalized probability distributions, as exemplified by Boltzmann Generators (BGs). A critical challenge in deploying…

机器学习 · 统计学 2025-10-27 Johann Flemming Gloy , Simon Olsson