机器学习 — Scifaro

ReLU integral probability metric and its applications

We propose a parametric integral probability metric (IPM) to measure the discrepancy between two probability measures. The proposed IPM leverages a specific parametric family of discriminators, such as single-node neural networks with ReLU…

机器学习 · 统计学 2025-04-29 Yuha Park , Kunwoong Kim , Insung Kong , Yongdai Kim

A Dictionary of Closed-Form Kernel Mean Embeddings

Kernel mean embeddings -- integrals of a kernel with respect to a probability distribution -- are essential in Bayesian quadrature, but also widely used in other computational tools for numerical integration or for statistical inference…

机器学习 · 统计学 2025-04-29 François-Xavier Briol , Alexandra Gessner , Toni Karvonen , Maren Mahsereci

Local Polynomial Lp-norm Regression

The local least squares estimator for a regression curve cannot provide optimal results when non-Gaussian noise is present. Both theoretical and empirical evidence suggests that residuals often exhibit distributional properties different…

机器学习 · 统计学 2025-04-29 Ladan Tazik , James Stafford , John Braun

Foundations of Safe Online Reinforcement Learning in the Linear Quadratic Regulator: $\sqrt{T}$-Regret

Understanding how to efficiently learn while adhering to safety constraints is essential for using online reinforcement learning in practical applications. However, proving rigorous regret bounds for safety-constrained reinforcement…

机器学习 · 统计学 2025-04-29 Benjamin Schiffer , Lucas Janson

Statistical Inference for Clustering-based Anomaly Detection

Unsupervised anomaly detection (AD) is a fundamental problem in machine learning and statistics. A popular approach to unsupervised AD is clustering-based detection. However, this method lacks the ability to guarantee the reliability of the…

机器学习 · 统计学 2025-04-29 Nguyen Thi Minh Phu , Duong Tan Loc , Vo Nguyen Le Duy

Likelihood-Free Variational Autoencoders

Variational Autoencoders (VAEs) typically rely on a probabilistic decoder with a predefined likelihood, most commonly an isotropic Gaussian, to model the data conditional on latent variables. While convenient for optimization, this choice…

机器学习 · 统计学 2025-04-29 Chen Xu , Qiang Wang , Lijun Sun

A Meta-learner for Heterogeneous Effects in Difference-in-Differences

We address the problem of estimating heterogeneous treatment effects in panel data, adopting the popular Difference-in-Differences (DiD) framework under the conditional parallel trends assumption. We propose a novel doubly robust…

机器学习 · 统计学 2025-04-29 Hui Lan , Haoge Chang , Eleanor Dillon , Vasilis Syrgkanis

GeoConformal prediction: a model-agnostic framework of measuring the uncertainty of spatial prediction

Spatial prediction is a fundamental task in geography. In recent years, with advances in geospatial artificial intelligence (GeoAI), numerous models have been developed to improve the accuracy of geographic variable predictions. Beyond…

机器学习 · 统计学 2025-04-29 Xiayin Lou , Peng Luo , Liqiu Meng

Causal-discovery-based root-cause analysis and its application in time-series prediction error diagnosis

Recent rapid advancements of machine learning have greatly enhanced the accuracy of prediction models, but most models remain "black boxes", making prediction error diagnosis challenging, especially with outliers. This lack of transparency…

机器学习 · 统计学 2025-04-29 Hiroshi Yokoyama , Ryusei Shingaki , Kaneharu Nishino , Shohei Shimizu , Thong Pham

Adaptive Sample Aggregation In Transfer Learning

Transfer Learning aims to optimally aggregate samples from a target distribution, with related samples from a so-called source distribution to improve target risk. Multiple procedures have been proposed over the last two decades to address…

机器学习 · 统计学 2025-04-29 Steve Hanneke , Samory Kpotufe

Building a stable classifier with the inflated argmax

We propose a new framework for algorithmic stability in the context of multiclass classification. In practice, classification algorithms often operate by first assigning a continuous score (for instance, an estimated probability) to each…

机器学习 · 统计学 2025-04-29 Jake A. Soloff , Rina Foygel Barber , Rebecca Willett

Causal Q-Aggregation for CATE Model Selection

Accurate estimation of conditional average treatment effects (CATE) is at the core of personalized decision making. While there is a plethora of models for CATE estimation, model selection is a nontrivial task, due to the fundamental…

机器学习 · 统计学 2025-04-29 Hui Lan , Vasilis Syrgkanis

Enhancing Visual Interpretability and Explainability in Functional Survival Trees and Forests

Functional survival models are key tools for analyzing time-to-event data with complex predictors, such as functional or high-dimensional inputs. Despite their predictive strength, these models often lack interpretability, which limits…

机器学习 · 统计学 2025-04-28 Giuseppe Loffredo , Elvira Romano , Fabrizio MAturo

Generalization Guarantees for Multi-View Representation Learning and Application to Regularization via Gaussian Product Mixture Prior

We study the problem of distributed multi-view representation learning. In this problem, $K$ agents observe each one distinct, possibly statistically correlated, view and independently extracts from it a suitable representation in a manner…

机器学习 · 统计学 2025-04-28 Milad Sefidgaran , Abdellatif Zaidi , Piotr Krasnowski

Post-Transfer Learning Statistical Inference in High-Dimensional Regression

Transfer learning (TL) for high-dimensional regression (HDR) is an important problem in machine learning, particularly when dealing with limited sample size in the target task. However, there currently lacks a method to quantify the…

机器学习 · 统计学 2025-04-28 Nguyen Vu Khai Tam , Cao Huyen My , Vo Nguyen Le Duy

Efficient Budget Allocation for Large-Scale LLM-Enabled Virtual Screening

Screening tasks that aim to identify a small subset of top alternatives from a large pool are common in business decision-making processes. These tasks often require substantial human effort to evaluate each alternative's performance,…

机器学习 · 统计学 2025-04-28 Zaile Li , Weiwei Fan , L. Jeff Hong

Robust Kernel Hypothesis Testing under Data Corruption

We propose a general method for constructing robust permutation tests under data corruption. The proposed tests effectively control the non-asymptotic type I error under data corruption, and we prove their consistency in power under minimal…

机器学习 · 统计学 2025-04-28 Antonin Schrab , Ilmun Kim

Prior-Dependent Allocations for Bayesian Fixed-Budget Best-Arm Identification in Structured Bandits

We study the problem of Bayesian fixed-budget best-arm identification (BAI) in structured bandits. We propose an algorithm that uses fixed allocations based on the prior information and the structure of the environment. We provide…

机器学习 · 统计学 2025-04-28 Nicolas Nguyen , Imad Aouali , András György , Claire Vernade

Evaluating Uncertainty in Deep Gaussian Processes

Reliable uncertainty estimates are crucial in modern machine learning. Deep Gaussian Processes (DGPs) and Deep Sigma Point Processes (DSPPs) extend GPs hierarchically, offering promising methods for uncertainty quantification grounded in…

机器学习 · 统计学 2025-04-25 Matthijs van der Lende , Jeremias Lino Ferrao , Niclas Müller-Hof

Causal rule ensemble approach for multi-arm data

Heterogeneous treatment effect (HTE) estimation is critical in medical research. It provides insights into how treatment effects vary among individuals, which can provide statistical evidence for precision medicine. While most existing…

机器学习 · 统计学 2025-04-25 Ke Wan , Kensuke Tanioka , Toshio Shimokawa