机器学习 — Scifaro

A Practical Theory of Generalization in Selectivity Learning

Query-driven machine learning models have emerged as a promising estimation technique for query selectivities. Yet, surprisingly little is known about the efficacy of these techniques from a theoretical perspective, as there exist…

机器学习 · 统计学 2026-05-19 Peizhi Wu , Haoshu Xu , Ryan Marcus , Zachary G. Ives

Generalization analysis with deep ReLU networks for metric and similarity learning

While metric and similarity learning has been extensively studied from several theoretical perspectives, a rigorous understanding of its generalization performance is still lacking. In this paper, we investigate the generalization behavior…

机器学习 · 统计学 2026-05-19 Junyu Zhou , Puyu Wang , Ding-Xuan Zhou

Mat\'ern Gaussian Processes on Graphs

Gaussian processes are a versatile framework for learning unknown functions in a manner that permits one to utilize prior information about their properties. Although many different Gaussian process models are readily available when the…

机器学习 · 统计学 2026-05-19 Viacheslav Borovitskiy , Iskander Azangulov , Alexander Terenin , Peter Mostowsky , Marc Peter Deisenroth , Nicolas Durrande

A Scalable Nonparametric Continuous-Time Survival Model through Numerical Quadrature

Flexible continuous-time survival modeling is critical for capturing complex time-varying hazard dynamics in high-dimensional data; however, training such models remains challenging due to the intractable integral required for likelihood…

机器学习 · 统计学 2026-05-18 Chaeyeon Lee , Sehwan Kim , Hyungrok Do

Skew-adaptive conformal prediction

We develop a skew-adaptive extension of split conformal prediction for regression. The method starts from an asymmetric interval family centered at a point prediction and uses the gauge approach to deduce the conformity score induced by…

机器学习 · 统计学 2026-05-18 Paulo C. Marques F. , Helton Graziadei

A numerical study into neural network surrogate model performance for uncertainty propagation

Neural network surrogate models have emerged as a promising approach to model solution fields for a wide variety of boundary value problems encountered in physical modeling. Stochastic problems represent an area of particularly high…

机器学习 · 统计学 2026-05-18 Noah Wade , Kirubel Teferra

Explainable AI Isn't Enough! Rethinking Algorithmic Contestability

Machine learning systems increasingly make life-changing decisions about individuals, such as loan approvals, hiring, and cheating detection, raising a pressing question: how can individuals respond to negative decisions made by these…

机器学习 · 统计学 2026-05-18 Timo Freiesleben , Kristof Meding , Gunnar König

Testing properties of trees in graphical models with covariance queries

We consider the problem of testing properties of graphs underlying high-dimensional graphical models. We adopt the model of covariance queries introduced by Lugosi, Truszkowski, Velona, and Zwiernik (2021). We study the case when the…

机器学习 · 统计学 2026-05-18 Sofiya Burova , Francisco Calvillo , Gábor Lugosi , Piotr Zwiernik

Unsupervised Domain Shift Detection with Interpretable Subspace Attribution

We developed a tool for detecting domain shifts, namely subtle differences in the probability distributions of datasets. We identify these shifts using an algorithm designed to detect localised density anomalies in high-dimensional feature…

机器学习 · 统计学 2026-05-18 Sebastian Springer , Alessandro Laio

$\alpha$-TCAV: A Unified Framework for Testing with Concept Activation Vectors

Concept Activation Vectors (CAVs) are a fundamental tool for concept-based explainability in deep learning, yet their practical utility is limited by statistical instability. We analyze the stochastic nature of CAVs and the Testing with…

机器学习 · 统计学 2026-05-18 Ekkehard Schnoor , Jawher Said , Malik Tiomoko , Wojciech Samek , Alexander Jung

Pessimistic Risk-Aware Policy Learning in Contextual Bandits

We study risk-aware offline policy learning, aiming to learn a decision rule from logged data that is optimal under general risk criteria. This problem is crucial in high-stakes domains where online interaction is infeasible and adverse…

机器学习 · 统计学 2026-05-18 Yilong Wan , Yuqiang Li , Xianyi Wu

MaxSketch: Robust Distinct Counting in Streams via Random Projections

Estimating the number of distinct elements in a data stream is well understood when repeated elements are identical. In modern settings, however, observations are high-dimensional and noisy, so repeated instances of the same object are only…

机器学习 · 统计学 2026-05-18 Nikos Tsikouras , Constantine Caramanis , Christos Tzamos

Harnessing Unimodality in Semiparametric Contextual Pricing via Oracle Price Map Learning

We study contextual dynamic pricing in a semiparametric scalar-index valuation model where the latent value is $v_t=\mu_\ast(\mathsf c_t)+\xi_t$, with an unknown utility map $\mu_\ast$ and an unknown additive noise distribution. The key…

机器学习 · 统计学 2026-05-18 Yingying Fan , Yuxuan Han , Jinchi Lv , Xiaocong Xu , Zhengyuan Zhou

On Kernel Eigen-alignments of KRR: Reconstruction and Generalization

This paper investigates the critical role of eigenalignments between the kernel matrix and learning targets in achieving robust generalization in learning problems. We establish a direct connection between generalization performance in…

机器学习 · 统计学 2026-05-18 Yang Liu , Ernest Fokoue , Richard Lange , Daniel Krutz

Logging Policy Design for Off-Policy Evaluation

Off-policy evaluation (OPE) estimates the value of a target treatment policy (e.g., a recommender system) using data collected by a different logging policy. It enables high-stakes experimentation without live deployment, yet in practice…

机器学习 · 统计学 2026-05-18 Connor Douglas , Joel Persson , Foster Provost

On the Burden of Achieving Fairness in Conformal Prediction

Conformal prediction is often calibrated with a single pooled threshold, but this can hide cross-group heterogeneity in score distributions and distort group-wise coverage. We study this phenomenon through the population score distributions…

机器学习 · 统计学 2026-05-18 Ziang Gao , Pengqi Liu , Archer Yi Yang , Mouloud Belbahri , Jesse C. Cresswell , Masoud Asgharian

Neural Backward Filtering Forward Guiding

Inference in nonlinear continuous stochastic processes on trees is challenging, particularly when observations are sparse and the topology is complex. Exact smoothing via Doob's $h$-transform is intractable for general nonlinear dynamics.…

机器学习 · 统计学 2026-05-18 Gefan Yang , Frank van der Meulen , Stefan Sommer

Vector-valued self-normalized concentration inequalities beyond sub-Gaussianity

The study of self-normalized processes plays a crucial role in a wide range of applications, from sequential decision-making to econometrics. While the behavior of self-normalized concentration has been widely investigated for scalar-valued…

机器学习 · 统计学 2026-05-18 Diego Martinez-Taboada , Tomas Gonzalez , Aaditya Ramdas

Preconditioned Regularized Wasserstein Proximal Sampling

We consider sampling from a Gibbs distribution by evolving finitely many particles. We propose a preconditioned version of a recently proposed noise-free sampling method, governed by approximating the score function with the numerically…

机器学习 · 统计学 2026-05-18 Hong Ye Tan , Stanley Osher , Wuchen Li

Overfitting has a limitation: a model-independent generalization gap bound based on R\'enyi entropy

Will further scaling up of machine learning models continue to bring success? A significant challenge in answering this question lies in understanding generalization gap, which is the impact of overfitting. Understanding generalization gap…

机器学习 · 统计学 2026-05-18 Atsushi Suzuki , Jing Wang