机器学习 — Scifaro

Forecast collapse of transformer-based models under squared loss in financial time series

We study trajectory forecasting under squared loss for time series with weak conditional structure, using highly expressive prediction models. Building on the classical characterization of squared-loss risk minimization, we emphasize…

机器学习 · 统计学 2026-04-02 Pierre Andreoletti

Scaled Gradient Descent for Ill-Conditioned Low-Rank Matrix Recovery with Optimal Sampling Complexity

The low-rank matrix recovery problem seeks to reconstruct an unknown $n_1 \times n_2$ rank-$r$ matrix from $m$ linear measurements, where $m\ll n_1n_2$. This problem has been extensively studied over the past few decades, leading to a…

机器学习 · 统计学 2026-04-02 Zhenxuan Li , Meng Huang

Isomorphic Functionalities between Ant Colony and Ensemble Learning: Part II-On the Strength of Weak Learnability and the Boosting Paradigm

In Part I of this series, we established a rigorous mathematical isomorphism between ant colony decision-making and random forest learning, demonstrating that variance reduction through decorrelation is a universal principle shared by…

机器学习 · 统计学 2026-04-02 Ernest Fokoué , Gregory Babbitt , Yuval Levental

Closed-form conditional diffusion models for data assimilation

We propose closed-form conditional diffusion models for data assimilation. Diffusion models use data to learn the score function (defined as the gradient of the log-probability density of a data distribution), allowing them to generate new…

机器学习 · 统计学 2026-04-02 Brianna Binder , Agnimitra Dasgupta , Assad Oberai

Taxonomy-Conditioned Hierarchical Bayesian TSB Models for Heterogeneous Intermittent Demand Forecasting

Intermittent demand forecasting poses unique challenges due to sparse observations, cold-start items, and obsolescence. Classical models such as Croston, SBA, and the Teunter--Syntetos--Babai (TSB) method provide simple heuristics but lack…

机器学习 · 统计学 2026-04-02 Zong-Han Bai , Po-Yen Chu

E-Scores for (In)Correctness Assessment of Generative Model Outputs

While generative models, especially large language models (LLMs), are ubiquitous in today's world, principled mechanisms to assess their (in)correctness are limited. Using the conformal prediction framework, previous works construct sets of…

机器学习 · 统计学 2026-04-02 Guneet S. Dhillon , Javier González , Teodora Pandeva , Alicia Curth

Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains

We study the convergence of off-policy TD(0) with linear function approximation when used to approximate the expected discounted reward in a Markov chain. It is well known that the combination of off-policy learning and function…

机器学习 · 统计学 2026-04-02 Maik Overmars , Jasper Goseling , Richard Boucherie

Disentanglement of Sources in a Multi-Stream Variational Autoencoder

Variational autoencoders (VAEs) are among leading approaches to address the problem of learning disentangled representations. Typically a single VAE is used and disentangled representations are sought within its single continuous latent…

机器学习 · 统计学 2026-04-02 Veranika Boukun , Jörg Lücke

Conditional Flow Matching for Bayesian Posterior Inference

We propose a generative multivariate posterior sampler via flow matching. It offers a simple training objective, and does not require access to likelihood evaluation. The method learns a dynamic, block-triangular velocity field in the joint…

机器学习 · 统计学 2026-04-02 Percy S. Zhai , So Won Jeong , Veronika Ročková

Beyond Real Data: Synthetic Data through the Lens of Regularization

Synthetic data can improve generalization when real data is scarce, but excessive reliance may introduce distributional mismatches that degrade performance. In this paper, we present a learning-theoretic framework to quantify the trade-off…

机器学习 · 统计学 2026-04-02 Amitis Shidani , Tyler Farghly , Yang Sun , Habib Ganjgahi , George Deligiannidis

Representative, Informative, and De-Amplifying: Requirements for Robust Bayesian Active Learning under Model Misspecification

In many science and industry settings, a central challenge is designing experiments under time and budget constraints. Bayesian Optimal Experimental Design (BOED) is a paradigm to pick maximally informative designs that has been widely…

机器学习 · 统计学 2026-04-02 Roubing Tang , Sabina J. Sloman , Samuel Kaski

A Pure Hypothesis Test for Inhomogeneous Random Graph Models Based on a Kernelised Stein Discrepancy

Complex data are often represented as a graph, which in turn can often be viewed as a realisation of a random graph, such as an inhomogeneous random graph model (IRG). For general fast goodness-of-fit tests in high dimensions, kernelised…

机器学习 · 统计学 2026-04-02 Anum Fatima , Gesine Reinert

Adaptive Diffusion Guidance via Stochastic Optimal Control

Guidance is a cornerstone of modern diffusion models, playing a pivotal role in conditional generation and enhancing the quality of unconditional samples. However, current approaches to guidance scheduling--determining the appropriate…

机器学习 · 统计学 2026-04-02 Iskander Azangulov , Peter Potaptchik , Qinyu Li , Eddie Aamari , George Deligiannidis , Judith Rousseau

No-Regret Generative Modeling via Parabolic Monge-Amp\`ere PDE

We introduce a novel generative modeling framework based on a discretized parabolic Monge-Amp\`{e}re PDE, which emerges as a continuous limit of the Sinkhorn algorithm commonly used in optimal transport. Our method performs iterative…

机器学习 · 统计学 2026-04-02 Nabarun Deb , Tengyuan Liang

Identifying Drift, Diffusion, and Causal Structure from Temporal Snapshots

Stochastic differential equations (SDEs) are a fundamental tool for modelling dynamic processes, including gene regulatory networks (GRNs), contaminant transport, financial markets, and image generation. However, learning the underlying SDE…

机器学习 · 统计学 2026-04-02 Vincent Guan , Joseph Janssen , Hossein Rahmani , Andrew Warren , Stephen Zhang , Elina Robeva , Geoffrey Schiebinger

Scale-adaptive and robust intrinsic dimension estimation via optimal neighbourhood identification

The Intrinsic Dimension (ID) is a key concept in unsupervised learning and feature selection, as it is a lower bound to the number of variables which are necessary to describe a system. However, in almost any real-world dataset the ID…

机器学习 · 统计学 2026-04-02 Antonio Di Noia , Iuri Macocco , Aldo Glielmo , Alessandro Laio , Antonietta Mira

Pure Differential Privacy for Functional Summaries with a Laplace-like Process

Many existing mechanisms for achieving differential privacy (DP) on infinite-dimensional functional summaries typically involve embedding these functional summaries into finite-dimensional subspaces and applying traditional multivariate DP…

机器学习 · 统计学 2026-04-02 Haotian Lin , Matthew Reimherr

MCMC-Correction of Score-Based Diffusion Models for Model Composition

Diffusion models can be parameterized in terms of either score or energy function. The energy parameterization is attractive as it enables sampling procedures such as Markov Chain Monte Carlo (MCMC) that incorporates a Metropolis--Hastings…

机器学习 · 统计学 2026-04-02 Anders Sjöberg , Jakob Lindqvist , Magnus Önnheim , Mats Jirstrand , Lennart Svensson

mlr3mbo: Bayesian Optimization in R

We present mlr3mbo, a comprehensive and modular toolbox for Bayesian optimization in R. mlr3mbo supports single- and multi-objective optimization, multi-point proposals, batch and asynchronous parallelization, input and output…

机器学习 · 统计学 2026-04-01 Marc Becker , Lennart Schneider , Martin Binder , Lars Kotthoff , Bernd Bischl

Unbounded Density Ratio Estimation and Its Application to Covariate Shift Adaptation

This paper focuses on the problem of unbounded density ratio estimation -- an understudied yet critical challenge in statistical learning -- and its application to covariate shift adaptation. Much of the existing literature assumes that the…

机器学习 · 统计学 2026-04-01 Ren-Rui Liu , Jun Fan , Lei Shi , Zheng-Chu Guo