机器学习 — Scifaro

Forecasting Automotive Supply Chain Shortfalls with Heterogeneous Time Series

Operational disruptions can significantly impact companies performance. Ford, with its 37 plants globally, uses 17 billion parts annually to manufacture six million cars and trucks. With up to ten tiers of suppliers between the company and…

机器学习 · 统计学 2025-06-17 Bach Viet Do , Xingyu Li , Chaoye Pan

Efficient Numerical Integration in Reproducing Kernel Hilbert Spaces via Leverage Scores Sampling

In this work we consider the problem of numerical integration, i.e., approximating integrals with respect to a target probability measure using only pointwise evaluations of the integrand. We focus on the setting in which the target…

机器学习 · 统计学 2025-06-17 Antoine Chatalic , Nicolas Schreuder , Ernesto De Vito , Lorenzo Rosasco

Spectral Estimation with Free Decompression

Computing eigenvalues of very large matrices is a critical task in many machine learning applications, including the evaluation of log-determinants, the trace of matrix functions, and other important metrics. As datasets continue to grow in…

机器学习 · 统计学 2025-06-16 Siavash Ameli , Chris van der Heide , Liam Hodgkinson , Michael W. Mahoney

Learning Overspecified Gaussian Mixtures Exponentially Fast with the EM Algorithm

We investigate the convergence properties of the EM algorithm when applied to overspecified Gaussian mixture models -- that is, when the number of components in the fitted model exceeds that of the true underlying distribution. Focusing on…

机器学习 · 统计学 2025-06-16 Zhenisbek Assylbekov , Alan Legg , Artur Pak

Bayesian Optimization with Inexact Acquisition: Is Random Grid Search Sufficient?

Bayesian optimization (BO) is a widely used iterative algorithm for optimizing black-box functions. Each iteration requires maximizing an acquisition function, such as the upper confidence bound (UCB) or a sample path from the Gaussian…

机器学习 · 统计学 2025-06-16 Hwanwoo Kim , Chong Liu , Yuxin Chen

Using Deep Operators to Create Spatio-temporal Surrogates for Dynamical Systems under Uncertainty

Spatio-temporal data, which consists of responses or measurements gathered at different times and positions, is ubiquitous across diverse applications of civil infrastructure. While SciML methods have made significant progress in tackling…

机器学习 · 统计学 2025-06-16 Jichuan Tang , Patrick T. Brewick , Ryan G. McClarren , Christopher Sweet

Collaborative Prediction: To Join or To Disjoin Datasets

With the recent rise of generative Artificial Intelligence (AI), the need of selecting high-quality dataset to improve machine learning models has garnered increasing attention. However, some part of this topic remains underexplored, even…

机器学习 · 统计学 2025-06-16 Kyung Rok Kim , Yansong Wang , Xiaocheng Li , Guanting Chen

A Framework for Non-Linear Attention via Modern Hopfield Networks

In this work we propose an energy functional along the lines of Modern Hopfield Networks (MNH), the stationary points of which correspond to the attention due to Vaswani et al. [12], thus unifying both frameworks. The minima of this…

机器学习 · 统计学 2025-06-16 Ahmed Farooq

Practical Improvements of A/B Testing with Off-Policy Estimation

We address the problem of A/B testing, a widely used protocol for evaluating the potential improvement achieved by a new decision system compared to a baseline. This protocol segments the population into two subgroups, each exposed to a…

机器学习 · 统计学 2025-06-16 Otmane Sakhi , Alexandre Gilotte , David Rohde

Guiding Time-Varying Generative Models with Natural Gradients on Exponential Family Manifold

Optimising probabilistic models is a well-studied field in statistics. However, its connection with the training of generative models remains largely under-explored. In this paper, we show that the evolution of time-varying generative…

机器学习 · 统计学 2025-06-16 Song Liu , Leyang Wang , Yakun Wang

Gaussian Process Regression for Inverse Problems in Linear PDEs

This paper introduces a computationally efficient algorithm in system theory for solving inverse problems governed by linear partial differential equations (PDEs). We model solutions of linear PDEs using Gaussian processes with priors…

机器学习 · 统计学 2025-06-16 Xin Li , Markus Lange-Hegermann , Bogdan Raiţă

What Exactly Does Guidance Do in Masked Discrete Diffusion Models

We study masked discrete diffusion models with classifier-free guidance (CFG). Assuming no score error nor discretization error, we derive an explicit solution to the guided reverse dynamics, so that how guidance influences the sampling…

机器学习 · 统计学 2025-06-13 He Ye , Rojas Kevin , Tao Molei

Measuring Semantic Information Production in Generative Diffusion Models

It is well known that semantic and structural features of the generated images emerge at different times during the reverse dynamics of diffusion, a phenomenon that has been connected to physical phase transitions in magnets and other…

机器学习 · 统计学 2025-06-13 Florian Handke , Félix Koulischer , Gabriel Raya , Luca Ambrogioni

Fundamental Limits of Learning High-dimensional Simplices in Noisy Regimes

In this paper, we establish sample complexity bounds for learning high-dimensional simplices in $\mathbb{R}^K$ from noisy data. Specifically, we consider $n$ i.i.d. samples uniformly drawn from an unknown simplex in $\mathbb{R}^K$, each…

机器学习 · 统计学 2025-06-13 Seyed Amir Hossein Saberi , Amir Najafi , Abolfazl Motahari , Babak H. khalaj

Hybrid Bernstein Normalizing Flows for Flexible Multivariate Density Regression with Interpretable Marginals

Density regression models allow a comprehensive understanding of data by modeling the complete conditional probability distribution. While flexible estimation approaches such as normalizing flows (NF) work particularly well in multiple…

机器学习 · 统计学 2025-06-13 Marcel Arpogaus , Thomas Kneib , Thomas Nagler , David Rügamer

Generative Modeling with Diffusion

We provide an overview of the diffusion model as a method to generate new samples. Generative models have been recently adopted for tasks such as art generation (Stable Diffusion, Dall-E) and text generation (ChatGPT). Diffusion models in…

机器学习 · 统计学 2025-06-13 Justin Le

Debiasing Watermarks for Large Language Models via Maximal Coupling

Watermarking language models is essential for distinguishing between human and machine-generated text and thus maintaining the integrity and trustworthiness of digital communication. We present a novel green/red list watermarking approach…

机器学习 · 统计学 2025-06-13 Yangxinyu Xie , Xiang Li , Tanwi Mallick , Weijie J. Su , Ruixun Zhang

Agnostic Smoothed Online Learning without Knowledge of the Base Measure

Classical results in statistical learning typically consider two extreme data-generating models: i.i.d. instances from an unknown distribution, or fully adversarial instances, often much more challenging statistically. To bridge the gap…

机器学习 · 统计学 2025-06-13 Moïse Blanchard

General targeted machine learning for modern causal mediation analysis

Causal mediation analyses investigate the mechanisms through which causes exert their effects, and are therefore central to scientific progress. The literature on the non-parametric definition and identification of mediational effects in…

机器学习 · 统计学 2025-06-13 Richard Liu , Nicholas T. Williams , Kara E. Rudolph , Iván Díaz

Flexible Tails for Normalizing Flows

Normalizing flows are a flexible class of probability distributions, expressed as transformations of a simple base distribution. A limitation of standard normalizing flows is representing distributions with heavy tails, which arise in…

机器学习 · 统计学 2025-06-13 Tennessee Hickling , Dennis Prangle