机器学习 — Scifaro

A Gapped Scale-Sensitive Dimension and Lower Bounds for Offset Rademacher Complexity

We study gapped scale-sensitive dimensions of a function class in both sequential and non-sequential settings. We demonstrate that covering numbers for any uniformly bounded class are controlled above by these gapped dimensions,…

机器学习 · 统计学 2025-09-26 Zeyu Jia , Yury Polyanskiy , Alexander Rakhlin

Sample completion, structured correlation, and Netflix problems

We develop a new high-dimensional statistical learning model which can take advantage of structured correlation in data even in the presence of randomness. We completely characterize learnability in this model in terms of…

机器学习 · 统计学 2025-09-26 Leonardo N. Coregliano , Maryanthe Malliaris

Tensor State Space-based Dynamic Multilayer Network Modeling

Understanding the complex interactions within dynamic multilayer networks is critical for advancements in various scientific domains. Existing models often fail to capture such networks' temporal and cross-layer dynamics. This paper…

机器学习 · 统计学 2025-09-26 Tian Lan , Jie Guo , Chen Zhang

Hybrid Summary Statistics

We present a way to capture high-information posteriors from training sets that are sparsely sampled over the parameter space for robust simulation-based inference. In physical inference problems, we can often apply domain knowledge to…

机器学习 · 统计学 2025-09-26 T. Lucas Makinen , Ce Sui , Benjamin D. Wandelt , Natalia Porqueres , Alan Heavens

Towards Complete Causal Explanation with Expert Knowledge

We study the problem of restricting a Markov equivalence class of maximal ancestral graphs (MAGs) to only those MAGs that contain certain edge marks, which we refer to as expert or orientation knowledge. Such a restriction of the Markov…

机器学习 · 统计学 2025-09-26 Aparajithan Venkateswaran , Emilija Perković

Optimal Sampling Designs for Multi-dimensional Streaming Time Series with Application to Power Grid Sensor Data

The Internet of Things (IoT) system generates massive high-speed temporally correlated streaming data and is often connected with online inference tasks under computational or energy constraints. Online analysis of these streaming time…

机器学习 · 统计学 2025-09-26 Rui Xie , Shuyang Bai , Ping Ma

Error Propagation in Dynamic Programming: From Stochastic Control to Option Pricing

This paper investigates theoretical and methodological foundations for stochastic optimal control (SOC) in discrete time. We start formulating the control problem in a general dynamic programming framework, introducing the mathematical…

机器学习 · 统计学 2025-09-25 Andrea Della Vecchia , Damir Filipović

First-Extinction Law for Resampling Processes

Extinction times in resampling processes are fundamental yet often intractable, as previous formulas scale as $2^M$ with the number of states $M$ present in the initial probability distribution. We solve this by treating multinomial updates…

机器学习 · 统计学 2025-09-25 Matteo Benati , Alessandro Londei , Denise Lanzieri , Vittorio Loreto

High-Dimensional Statistical Process Control via Manifold Fitting and Learning

We address the Statistical Process Control (SPC) of high-dimensional, dynamic industrial processes from two complementary perspectives: manifold fitting and manifold learning, both of which assume data lies on an underlying nonlinear, lower…

机器学习 · 统计学 2025-09-25 Burak I. Tas , Enrique del Castillo

Convex Regression with a Penalty

A common way to estimate an unknown convex regression function $f_0: \Omega \subset \mathbb{R}^d \rightarrow \mathbb{R}$ from a set of $n$ noisy observations is to fit a convex function that minimizes the sum of squared errors. However,…

机器学习 · 统计学 2025-09-25 Eunji Lim

MAGIC: Multi-task Gaussian process for joint imputation and classification in healthcare time series

Time series analysis has emerged as an important tool for improving patient diagnosis and management in healthcare applications. However, these applications commonly face two critical challenges: time misalignment and data sparsity.…

机器学习 · 统计学 2025-09-25 Dohyun Ku , Catherine D. Chong , Visar Berisha , Todd J. Schwedt , Jing Li

Anchored Langevin Algorithms

Standard first-order Langevin algorithms such as the unadjusted Langevin algorithm (ULA) are obtained by discretizing the Langevin diffusion and are widely used for sampling in machine learning because they scale to high dimensions and…

机器学习 · 统计学 2025-09-25 Mert Gurbuzbalaban , Hoang M. Nguyen , Xicheng Zhang , Lingjiong Zhu

Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time

We study the approximation gap between the dynamics of a polynomial-width neural network and its infinite-width counterpart, both trained using projected gradient descent in the mean-field scaling regime. We demonstrate how to tightly bound…

机器学习 · 统计学 2025-09-25 Margalit Glasgow , Denny Wu , Joan Bruna

Differentially Private Bootstrap: New Privacy Analysis and Inference Strategies

Differentially private (DP) mechanisms protect individual-level information by introducing randomness into the statistical analysis procedure. Despite the availability of numerous DP tools, there remains a lack of general techniques for…

机器学习 · 统计学 2025-09-25 Zhanyu Wang , Guang Cheng , Jordan Awan

A Gradient Flow Approach to Solving Inverse Problems with Latent Diffusion Models

Solving ill-posed inverse problems requires powerful and flexible priors. We propose leveraging pretrained latent diffusion models for this task through a new training-free approach, termed Diffusion-regularized Wasserstein Gradient Flow…

机器学习 · 统计学 2025-09-24 Tim Y. J. Wang , O. Deniz Akyildiz

Neighbor Embeddings Using Unbalanced Optimal Transport Metrics

This paper proposes the use of the Hellinger--Kantorovich metric from unbalanced optimal transport (UOT) in a dimensionality reduction and learning (supervised and unsupervised) pipeline. The performance of UOT is compared to that of…

机器学习 · 统计学 2025-09-24 Muhammad Rana , Keaton Hamm

Consistency of Selection Strategies for Fraud Detection

This paper studies how insurers can chose which claims to investigate for fraud. Given a prediction model, typically only claims with the highest predicted propability of being fraudulent are investigated. We argue that this can lead to…

机器学习 · 统计学 2025-09-24 Christos Revelas , Otilia Boldea , Bas J. M. Werker

End-Cut Preference in Survival Trees

The end-cut preference (ECP) problem, referring to the tendency to favor split points near the boundaries of a feature's range, is a well-known issue in CART (Breiman et al., 1984). ECP may induce highly imbalanced and biased splits,…

机器学习 · 统计学 2025-09-24 Xiaogang Su

Surrogate Modelling of Proton Dose with Monte Carlo Dropout Uncertainty Quantification

Accurate proton dose calculation using Monte Carlo (MC) is computationally demanding in workflows like robust optimisation, adaptive replanning, and probabilistic inference, which require repeated evaluations. To address this, we develop a…

机器学习 · 统计学 2025-09-24 Aaron Pim , Tristan Pryer

Functional effects models: Accounting for preference heterogeneity in panel data with machine learning

In this paper, we present a general specification for Functional Effects Models, which use Machine Learning (ML) methodologies to learn individual-specific preference parameters from socio-demographic characteristics, therefore accounting…

机器学习 · 统计学 2025-09-23 Nicolas Salvadé , Tim Hillel