机器学习 — Scifaro

Modeling Spatial Extremes using Non-Gaussian Spatial Autoregressive Models via Convolutional Neural Networks

Data derived from remote sensing or numerical simulations often have a regular gridded structure and are large in volume, making it challenging to find accurate spatial models that can fill in missing grid cells or simulate the process…

机器学习 · 统计学 2025-05-07 Sweta Rai , Douglas W. Nychka , Soutir Bandyopadhyay

GeoERM: Geometry-Aware Multi-Task Representation Learning on Riemannian Manifolds

Multi-Task Learning (MTL) seeks to boost statistical power and learning efficiency by discovering structure shared across related tasks. State-of-the-art MTL representation methods, however, usually treat the latent representation matrix as…

机器学习 · 统计学 2025-05-07 Aoran Chen , Yang Feng

E-Values Expand the Scope of Conformal Prediction

Conformal prediction is a powerful framework for distribution-free uncertainty quantification. The standard approach to conformal prediction relies on comparing the ranks of prediction scores: under exchangeability, the rank of a future…

机器学习 · 统计学 2025-05-07 Etienne Gauthier , Francis Bach , Michael I. Jordan

Taking a Big Step: Large Learning Rates in Denoising Score Matching Prevent Memorization

Denoising score matching plays a pivotal role in the performance of diffusion-based generative models. However, the empirical optimal score--the exact solution to the denoising score matching--leads to memorization, where generated samples…

机器学习 · 统计学 2025-05-07 Yu-Han Wu , Pierre Marion , Gérard Biau , Claire Boyer

Optimal Transport-based Conformal Prediction

Conformal Prediction (CP) is a principled framework for quantifying uncertainty in blackbox learning models, by constructing prediction sets with finite-sample coverage guarantees. Traditional approaches rely on scalar nonconformity scores,…

机器学习 · 统计学 2025-05-07 Gauthier Thurin , Kimia Nadjahi , Claire Boyer

Queueing Matching Bandits with Preference Feedback

In this study, we consider multi-class multi-server asymmetric queueing systems consisting of $N$ queues on one side and $K$ servers on the other side, where jobs randomly arrive in queues at each time. The service rate of each job-server…

机器学习 · 统计学 2025-05-07 Jung-hun Kim , Min-hwan Oh

Strong Screening Rules for Group-based SLOPE Models

Tuning the regularization parameter in penalized regression models is an expensive task, requiring multiple models to be fit along a path of parameters. Strong screening rules drastically reduce computational costs by lowering the…

机器学习 · 统计学 2025-05-07 Fabio Feser , Marina Evangelou

Ellipsoid fitting with the Cayley transform

We introduce Cayley transform ellipsoid fitting (CTEF), an algorithm that uses the Cayley transform to fit ellipsoids to noisy data in any dimension. Unlike many ellipsoid fitting methods, CTEF is ellipsoid specific, meaning it always…

机器学习 · 统计学 2025-05-07 Omar Melikechi , David B. Dunson

Truncated LinUCB for Stochastic Linear Bandits

This paper considers contextual bandits with a finite number of arms, where the contexts are independent and identically distributed $d$-dimensional random vectors, and the expected rewards are linear in both the arm parameters and…

机器学习 · 统计学 2025-05-07 Yanglei Song , Meng zhou

Extended Fiducial Inference for Individual Treatment Effects via Deep Neural Networks

Individual treatment effect estimation has gained significant attention in recent data science literature. This work introduces the Double Neural Network (Double-NN) method to address this problem within the framework of extended fiducial…

机器学习 · 统计学 2025-05-06 Sehwan Kim , Faming Liang

Bayesian learning of the optimal action-value function in a Markov decision process

The Markov Decision Process (MDP) is a popular framework for sequential decision-making problems, and uncertainty quantification is an essential component of it to learn optimal decision-making strategies. In particular, a Bayesian…

机器学习 · 统计学 2025-05-06 Jiaqi Guo , Chon Wai Ho , Sumeetpal S. Singh

TV-SurvCaus: Dynamic Representation Balancing for Causal Survival Analysis

Estimating the causal effect of time-varying treatments on survival outcomes is a challenging task in many domains, particularly in medicine where treatment protocols adapt over time. While recent advances in representation learning have…

机器学习 · 统计学 2025-05-06 Ayoub Abraich

SoftCVI: Contrastive variational inference with self-generated soft labels

Estimating a distribution given access to its unnormalized density is pivotal in Bayesian inference, where the posterior is generally known only up to an unknown normalizing constant. Variational inference and Markov chain Monte Carlo…

机器学习 · 统计学 2025-05-06 Daniel Ward , Mark Beaumont , Matteo Fasiolo

Statistical Agnostic Regression: a machine learning method to validate regression models

Regression analysis is a central topic in statistical modeling, aimed at estimating the relationships between a dependent variable, commonly referred to as the response variable, and one or more independent variables, i.e., explanatory…

机器学习 · 统计学 2025-05-06 Juan M Gorriz , J. Ramirez , F. Segovia , F. J. Martinez-Murcia , C. Jiménez-Mesa , J. Suckling

Robust Transfer Learning with Unreliable Source Data

This paper addresses challenges in robust transfer learning stemming from ambiguity in Bayes classifiers and weak transferable signals between the target and source distribution. We introduce a novel quantity called the ''ambiguity level''…

机器学习 · 统计学 2025-05-06 Jianqing Fan , Cheng Gao , Jason M. Klusowski

Wasserstein Distributionally Robust Estimation in High Dimensions: Performance Analysis and Optimal Hyperparameter Tuning

Distributionally robust optimization (DRO) has become a powerful framework for estimation under uncertainty, offering strong out-of-sample performance and principled regularization. In this paper, we propose a DRO-based method for linear…

机器学习 · 统计学 2025-05-06 Liviu Aolaritei , Soroosh Shafiee , Florian Dörfler

Overparametrized linear dimensionality reductions: From projection pursuit to two-layer neural networks

Given a cloud of $n$ data points in $\mathbb{R}^d$, consider all projections onto $m$-dimensional subspaces of $\mathbb{R}^d$ and, for each such projection, the empirical distribution of the projected points. What does this collection of…

机器学习 · 统计学 2025-05-06 Andrea Montanari , Kangjie Zhou

Heavy-Tail Phenomenon in Decentralized SGD

Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have…

机器学习 · 统计学 2025-05-06 Mert Gurbuzbalaban , Yuanhan Hu , Umut Simsekli , Kun Yuan , Lingjiong Zhu

Provable Efficiency of Guidance in Diffusion Models for General Data Distribution

Diffusion models have emerged as a powerful framework for generative modeling, with guidance techniques playing a crucial role in enhancing sample quality. Despite their empirical success, a comprehensive theoretical understanding of the…

机器学习 · 统计学 2025-05-05 Gen Li , Yuchen Jiao

Gaussian Differential Private Bootstrap by Subsampling

Bootstrap is a common tool for quantifying uncertainty in data analysis. However, besides additional computational costs in the application of the bootstrap on massive data, a challenging problem in bootstrap based inference under…

机器学习 · 统计学 2025-05-05 Holger Dette , Carina Graw