机器学习 — Scifaro

Distribution-Dependent Rates for Multi-Distribution Learning

To address the needs of modeling uncertainty in sensitive machine learning applications, the setup of distributionally robust optimization (DRO) seeks good performance uniformly across a variety of tasks. The recent multi-distribution…

机器学习 · 统计学 2026-01-01 Rafael Hanashiro , Patrick Jaillet

Are Ensembles Getting Better all the Time?

Ensemble methods combine the predictions of several base models. We study whether or not including more models always improves their average performance. This question depends on the kind of ensemble considered, as well as the predictive…

机器学习 · 统计学 2026-01-01 Pierre-Alexandre Mattei , Damien Garreau

Generative Modelling of L\'evy Area for High Order SDE Simulation

It is well understood that, when numerically simulating SDEs with general noise, achieving a strong convergence rate better than $O(\sqrt{h})$ (where h is the step size) requires the use of certain iterated integrals of Brownian motion,…

机器学习 · 统计学 2026-01-01 Andraž Jelinčič , Jiajie Tao , William F. Turner , Thomas Cass , James Foster , Hao Ni

Efficient Active Learning with Abstention

The goal of active learning is to achieve the same accuracy achievable by passive learning, while using much fewer labels. Exponential savings in terms of label complexity have been proved in very special cases, but fundamental lower bounds…

机器学习 · 统计学 2026-01-01 Yinglun Zhu , Robert Nowak

Learning to sample fibers for goodness-of-fit testing

We consider the problem of constructing exact goodness-of-fit tests for discrete exponential family models. This classical problem remains practically unsolved for many types of structured or sparse data, as it rests on a computationally…

机器学习 · 统计学 2025-12-31 Ivan Gvozdanović , Sonja Petrović

The Nonstationarity-Complexity Tradeoff in Return Prediction

We investigate machine learning models for stock return prediction in non-stationary environments, revealing a fundamental nonstationarity-complexity tradeoff: complex models reduce misspecification error but require longer training windows…

机器学习 · 统计学 2025-12-30 Agostino Capponi , Chengpiao Huang , J. Antonio Sidaoui , Kaizheng Wang , Jiacheng Zou

Probabilistic Modelling is Sufficient for Causal Inference

Causal inference is a key research area in machine learning, yet confusion reigns over the tools needed to tackle it. There are prevalent claims in the machine learning literature that you need a bespoke causal framework or notation to…

机器学习 · 统计学 2025-12-30 Bruno Mlodozeniec , David Krueger , Richard E. Turner

Federated Learning With L0 Constraint Via Probabilistic Gates For Sparsity

Federated Learning (FL) is a distributed machine learning setting that requires multiple clients to collaborate on training a model while maintaining data privacy. The unaddressed inherent sparsity in data and models often results in overly…

机器学习 · 统计学 2025-12-30 Krishna Harsha Kovelakuntla Huthasana , Alireza Olama , Andreas Lundell

JADAI: Jointly Amortizing Adaptive Design and Bayesian Inference

We consider problems of parameter estimation where design variables can be actively optimized to maximize information gain. To this end, we introduce JADAI, a framework that jointly amortizes Bayesian adaptive design and inference by…

机器学习 · 统计学 2025-12-30 Niels Bracher , Lars Kühmichel , Desi R. Ivanova , Xavier Intes , Paul-Christian Bürkner , Stefan T. Radev

Likelihood-Preserving Embeddings for Statistical Inference

Modern machine learning embeddings provide powerful compression of high-dimensional data, yet they typically destroy the geometric structure required for classical likelihood-based statistical inference. This paper develops a rigorous…

机器学习 · 统计学 2025-12-30 Deniz Akdemir

A General Weighting Theory for Ensemble Learning: Beyond Variance Reduction via Spectral and Geometric Structure

Ensemble learning is traditionally justified as a variance-reduction strategy, explaining its strong performance for unstable predictors such as decision trees. This explanation, however, does not account for ensembles constructed from…

机器学习 · 统计学 2025-12-30 Ernest Fokoué

On Fibonacci Ensembles: An Alternative Approach to Ensemble Learning Inspired by the Timeless Architecture of the Golden Ratio

Nature rarely reveals her secrets bluntly, yet in the Fibonacci sequence she grants us a glimpse of her quiet architecture of growth, harmony, and recursive stability \citep{Koshy2001Fibonacci, Livio2002GoldenRatio}. From spiral galaxies to…

机器学习 · 统计学 2025-12-30 Ernest Fokoué

A review of NMF, PLSA, LBA, EMA, and LCA with a focus on the identifiability issue

Across fields such as machine learning, social science, geography, considerable attention has been given to models that factorize a nonnegative matrix into the product of two or three matrices, subject to nonnegative or row-sum-to-1…

机器学习 · 统计学 2025-12-30 Qianqian Qi , Peter G. M. van der Heijden

Copula Discrepancy: Benchmarking Dependence Structure

We study a simple statistic for benchmarking how well a sample preserves a known bivariate dependence structure. Given a target copula family (Clayton or Gumbel) and parameter $\theta_P$, the Copula Discrepancy (CD) compares the target…

机器学习 · 统计学 2025-12-30 Agnideep Aich , Ashit Baran Aich

Neural Measures for learning distributions of Random PDEs

The integration of Scientific Machine Learning (SciML) techniques with uncertainty quantification (UQ) represents a rapidly evolving frontier in computational science. This work advances Physics-Informed Neural Networks (PINNs) by…

机器学习 · 统计学 2025-12-30 Georgios Arampatzis , Stylianos Katsarakis , Charalambos Makridakis

Poisson-Process Topic Model for Integrating Knowledge from Pre-trained Language Models

Topic modeling is traditionally applied to word counts without accounting for the context in which words appear. Recent advancements in large language models (LLMs) offer contextualized word embeddings, which capture deeper meaning and…

机器学习 · 统计学 2025-12-30 Morgane Austern , Yuanchuan Guo , Zheng Tracy Ke , Tianle Liu

A Unified View of Optimal Kernel Hypothesis Testing

This paper provides a unifying view of optimal kernel hypothesis testing across the MMD two-sample, HSIC independence, and KSD goodness-of-fit frameworks. Minimax optimal separation rates in the kernel and $L^2$ metrics are presented, with…

机器学习 · 统计学 2025-12-30 Antonin Schrab

Computational Lower Bounds for Correlated Random Graphs via Algorithmic Contiguity

In this paper, assuming the low-degree conjecture, we provide evidence of computational hardness for two problems: (1) the (partial) matching recovery problem in the sparse correlated Erd\H{o}s-R\'enyi graphs $\mathcal G(n,q;\rho)$ when the…

机器学习 · 统计学 2025-12-30 Zhangsong Li

Gaussian entropic optimal transport: Schr\"odinger bridges and the Sinkhorn algorithm

Entropic optimal transport problems are regularized versions of optimal transport problems. These models play an increasingly important role in machine learning and generative modelling. For finite spaces, these problems are commonly solved…

机器学习 · 统计学 2025-12-30 O. Deniz Akyildiz , Pierre Del Moral , Joaquín Miguez

Nonlinearity and Uncertainty Informed Moment-Matching Gaussian Mixture Splitting

Many problems in navigation and tracking require increasingly accurate characterizations of the evolution of uncertainty in nonlinear systems. Nonlinear uncertainty propagation approaches based on Gaussian mixture density approximations…

机器学习 · 统计学 2025-12-30 Jackson Kulik , Keith A. LeGrand