机器学习 — Scifaro

Balancing Interpretability and Flexibility in Modeling Diagnostic Trajectories with an Embedded Neural Hawkes Process Model

The Hawkes process (HP) is commonly used to model event sequences with self-reinforcing dynamics, including electronic health records (EHRs). Traditional HPs capture self-reinforcement via parametric impact functions that can be inspected…

机器学习 · 统计学 2025-10-23 Yuankang Zhao , Matthew Engelhard

Smoothed Distance Kernels for MMDs and Applications in Wasserstein Gradient Flows

Negative distance kernels $K(x,y) := - \|x-y\|$ were used in the definition of maximum mean discrepancies (MMDs) in statistics and lead to favorable numerical results in various applications. In particular, so-called slicing techniques for…

机器学习 · 统计学 2025-10-23 Nicolaj Rux , Michael Quellmalz , Gabriele Steidl

Evolution of Information in Interactive Decision Making: A Case Study for Multi-Armed Bandits

We study the evolution of information in interactive decision making through the lens of a stochastic multi-armed bandit problem. Focusing on a fundamental example where a unique optimal arm outperforms the rest by a fixed margin, we…

机器学习 · 统计学 2025-10-23 Yuzhou Gu , Yanjun Han , Jian Qian

Practical considerations for variable screening in the super learner

Estimating a prediction function is a fundamental component of many data analyses. The super learner ensemble, a particular implementation of stacking, has desirable theoretical properties and has been used successfully in many…

机器学习 · 统计学 2025-10-23 Brian D. Williamson , Drew King , Ying Huang

A Frequentist Statistical Introduction to Variational Inference, Autoencoders, and Diffusion Models

While Variational Inference (VI) is central to modern generative models like Variational Autoencoders (VAEs) and Denoising Diffusion Models (DDMs), its pedagogical treatment is split across disciplines. In statistics, VI is typically framed…

机器学习 · 统计学 2025-10-22 Yen-Chi Chen

Interval Prediction of Annual Average Daily Traffic on Local Roads via Quantile Random Forest with High-Dimensional Spatial Data

Accurate annual average daily traffic (AADT) data are vital for transport planning and infrastructure management. However, automatic traffic detectors across national road networks often provide incomplete coverage, leading to…

机器学习 · 统计学 2025-10-22 Ying Yao , Daniel J. Graham

Parametrising the Inhomogeneity Inducing Capacity of a Training Set, and its Impact on Supervised Learning

We introduce parametrisation of that property of the available training dataset, that necessitates an inhomogeneous correlation structure for the function that is learnt as a model of the relationship between the pair of variables,…

机器学习 · 统计学 2025-10-22 Gargi Roy , Dalia Chakrabarty

The Bias-Variance Tradeoff in Data-Driven Optimization: A Local Misspecification Perspective

Data-driven stochastic optimization is ubiquitous in machine learning and operational decision-making problems. Sample average approximation (SAA) and model-based approaches such as estimate-then-optimize (ETO) or integrated…

机器学习 · 统计学 2025-10-22 Haixiang Lan , Luofeng Liao , Adam N. Elmachtoub , Christian Kroer , Henry Lam , Haofeng Zhang

A novel Information-Driven Strategy for Optimal Regression Assessment

In Machine Learning (ML), a regression algorithm aims to minimize a loss function based on data. An assessment method in this context seeks to quantify the discrepancy between the optimal response for an input-output system and the estimate…

机器学习 · 统计学 2025-10-22 Benjamín Castro , Camilo Ramírez , Sebastián Espinosa , Jorge F. Silva , Marcos E. Orchard , Heraldo Rozas

The analogy theorem in Hoare logic

The introduction of machine learning methods has led to significant advances in automation, optimization, and discoveries in various fields of science and technology. However, their widespread application faces a fundamental limitation: the…

机器学习 · 统计学 2025-10-22 Nikitin Nikita

Stochastic Path Planning in Correlated Obstacle Fields

We introduce the Stochastic Correlated Obstacle Scene (SCOS) problem, a navigation setting with spatially correlated obstacles of uncertain blockage status, realistically constrained sensors that provide noisy readings and costly…

机器学习 · 统计学 2025-10-22 Li Zhou , Elvan Ceyhan

ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition

Many critical applications, from autonomous scientific discovery to personalized medicine, demand systems that can both strategically acquire the most informative data and instantaneously perform inference based upon it. While amortized…

机器学习 · 统计学 2025-10-22 Daolang Huang , Xinyi Wen , Ayush Bharti , Samuel Kaski , Luigi Acerbi

Nearly Dimension-Independent Convergence of Mean-Field Black-Box Variational Inference

We prove that, given a mean-field location-scale variational family, black-box variational inference (BBVI) with the reparametrization gradient converges at a rate that is nearly independent of explicit dimension dependence. Specifically,…

机器学习 · 统计学 2025-10-22 Kyurae Kim , Yi-An Ma , Trevor Campbell , Jacob R. Gardner

The $\varphi$ Curve: The Shape of Generalization through the Lens of Norm-based Capacity Control

Understanding how the test risk scales with model complexity is a central question in machine learning. Classical theory is challenged by the learning curves observed for large over-parametrized deep networks. Capacity measures based on…

机器学习 · 统计学 2025-10-22 Yichen Wang , Yudong Chen , Lorenzo Rosasco , Fanghui Liu

On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration

Learning from observations (LfO) replicates expert behavior without needing access to the expert's actions, making it more practical than learning from demonstrations (LfD) in many real-world scenarios. However, directly applying the…

机器学习 · 统计学 2025-10-22 Yirui Zhou , Yunfei Jin , Xiaowei Liu , Xiaofeng Zhang , Yangchun Zhang

Attention Meets Post-hoc Interpretability: A Mathematical Perspective

Attention-based architectures, in particular transformers, are at the heart of a technological revolution. Interestingly, in addition to helping obtain state-of-the-art results on a wide range of applications, the attention mechanism…

机器学习 · 统计学 2025-10-22 Gianluigi Lopardo , Frederic Precioso , Damien Garreau

Understanding Post-hoc Explainers: The Case of Anchors

In many scenarios, the interpretability of machine learning models is a highly required but difficult task. To explain the individual predictions of such models, local model-agnostic approaches have been proposed. However, the process…

机器学习 · 统计学 2025-10-22 Gianluigi Lopardo , Frederic Precioso , Damien Garreau

Towards Instance-Wise Calibration: Local Amortized Diagnostics and Reshaping of Conditional Densities (LADaR)

Key science questions, such as galaxy distance estimation and weather forecasting, often require knowing the full predictive distribution of a target variable $y$ given complex inputs $\mathbf{x}$. Despite recent advances in machine…

机器学习 · 统计学 2025-10-22 Biprateep Dey , David Zhao , Brett H. Andrews , Jeffrey A. Newman , Rafael Izbicki , Ann B. Lee

A Sea of Words: An In-Depth Analysis of Anchors for Text Data

Anchors (Ribeiro et al., 2018) is a post-hoc, rule-based interpretability method. For text data, it proposes to explain a decision by highlighting a small set of words (an anchor) such that the model to explain has similar outputs when they…

机器学习 · 统计学 2025-10-22 Gianluigi Lopardo , Frederic Precioso , Damien Garreau

Non-asymptotic error bounds for probability flow ODEs under weak log-concavity

Score-based generative modeling, implemented through probability flow ODEs, has shown impressive results in numerous practical settings. However, most convergence guarantees rely on restrictive regularity assumptions on the target…

机器学习 · 统计学 2025-10-21 Gitte Kremling , Francesco Iafrate , Mahsa Taheri , Johannes Lederer