机器学习 — Scifaro

Thompson Sampling in Function Spaces via Neural Operators

We propose an extension of Thompson sampling to optimization problems over function spaces where the objective is a known functional of an unknown operator's output. We assume that queries to the operator (such as running a high-fidelity…

机器学习 · 统计学 2026-01-21 Rafael Oliveira , Xuesong Wang , Kian Ming A. Chai , Edwin V. Bonilla

ALPCAHUS: Subspace Clustering for Heteroscedastic Data

Principal component analysis (PCA) is a key tool in the field of data dimensionality reduction. Various methods have been proposed to extend PCA to the union of subspace (UoS) setting for clustering data that comes from multiple subspaces…

机器学习 · 统计学 2026-01-21 Javier Salazar Cavazos , Jeffrey A Fessler , Laura Balzano

Post-Hoc Uncertainty Quantification in Pre-Trained Neural Networks via Activation-Level Gaussian Processes

Uncertainty quantification in neural networks through methods such as Dropout, Bayesian neural networks and Laplace approximations is either prone to underfitting or computationally demanding, rendering these approaches impractical for…

机器学习 · 统计学 2026-01-21 Richard Bergna , Stefan Depeweg , Sergio Calvo Ordonez , Jonathan Plenk , Alvaro Cartea , Jose Miguel Hernandez-Lobato

Variable transformations in consistent loss functions

The empirical use of variable transformations within (strictly) consistent loss functions is widespread, yet a theoretical understanding is lacking. To address this gap, we develop a theoretical framework that establishes formal…

机器学习 · 统计学 2026-01-21 Hristos Tyralis , Georgia Papacharalampous

A survey on Clustered Federated Learning: Taxonomy, Analysis and Applications

As Federated Learning (FL) expands, the challenge of non-independent and identically distributed (non-IID) data becomes critical. Clustered Federated Learning (CFL) addresses this by training multiple specialized models, each representing a…

机器学习 · 统计学 2026-01-21 Michael Ben Ali , Omar El-Rifai , Imen Megdiche , André Peninou , Olivier Teste

Another look at statistical inference with machine learning-imputed data

From structural biology to epidemiology, predictions from machine learning (ML) models increasingly complement costly gold-standard data, enabling faster, more affordable, and scalable scientific inquiry. In response, prediction-based (PB)…

机器学习 · 统计学 2026-01-21 Jessica Gronsbell , Jianhui Gao , Zachary R. McCaw , Yaqi Shi , David Cheng

U-learning for Prediction Inference via Combinatory Multi-Subsampling: With Applications to LASSO and Neural Networks

Epigenetic aging clocks play a pivotal role in estimating an individual's biological age through the examination of DNA methylation patterns at numerous CpG (Cytosine-phosphate-Guanine) sites within their genome. However, making valid…

机器学习 · 统计学 2026-01-21 Zhe Fei , Yi Li

Deep Functional Factor Models: Forecasting High-Dimensional Functional Time Series via Bayesian Nonparametric Factorization

This paper introduces the Deep Functional Factor Model (DF2M), a Bayesian nonparametric model designed for analysis of high-dimensional functional time series. DF2M is built upon the Indian Buffet Process and the multi-task Gaussian…

机器学习 · 统计学 2026-01-21 Yirui Liu , Xinghao Qiao , Yulong Pei , Liying Wang

Bias and Extrapolation in Markovian Linear Stochastic Approximation with Constant Stepsizes

We consider Linear Stochastic Approximation (LSA) with a constant stepsize and Markovian data. Viewing the joint process of the data and LSA iterate as a time-homogeneous Markov chain, we prove its convergence to a unique limiting and…

机器学习 · 统计学 2026-01-21 Dongyan Huo , Yudong Chen , Qiaomin Xie

Generalization Bounds for Sparse Random Feature Expansions

Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent…

机器学习 · 统计学 2026-01-21 Abolfazl Hashemi , Hayden Schaeffer , Robert Shi , Ufuk Topcu , Giang Tran , Rachel Ward

Local Minima Structures in Gaussian Mixture Models

We investigate the landscape of the negative log-likelihood function of Gaussian Mixture Models (GMMs) with a general number of components in the population limit. As the objective function is non-convex, there can be multiple local minima…

机器学习 · 统计学 2026-01-21 Yudong Chen , Dogyoon Song , Xumei Xi , Yuqian Zhang

Split-and-Conquer: Distributed Factor Modeling for High-Dimensional Matrix-Variate Time Series

In this paper, we propose a distributed framework for reducing the dimensionality of high-dimensional, large-scale, heterogeneous matrix-variate time series data using a factor model. The data are first partitioned column-wise (or row-wise)…

机器学习 · 统计学 2026-01-19 Hangjin Jiang , Yuzhou Li , Zhaoxing Gao

Accelerated Regularized Wasserstein Proximal Sampling Algorithms

We consider sampling from a Gibbs distribution by evolving a finite number of particles using a particular score estimator rather than Brownian motion. To accelerate the particles, we consider a second-order score-based ODE, similar to…

机器学习 · 统计学 2026-01-19 Hong Ye Tan , Stanley Osher , Wuchen Li

Transfer Learning for Benign Overfitting in High-Dimensional Linear Regression

Transfer learning is a key component of modern machine learning, enhancing the performance of target tasks by leveraging diverse data sources. Simultaneously, overparameterized models such as the minimum-$\ell_2$-norm interpolator (MNI) in…

机器学习 · 统计学 2026-01-19 Yeichan Kim , Ilmun Kim , Seyoung Park

FEAT: Free energy Estimators with Adaptive Transport

We present Free energy Estimators with Adaptive Transport (FEAT), a novel framework for free energy estimation -- a critical challenge across scientific domains. FEAT leverages learned transports implemented via stochastic interpolants and…

机器学习 · 统计学 2026-01-19 Jiajun He , Yuanqi Du , Francisco Vargas , Yuanqing Wang , Carla P. Gomes , José Miguel Hernández-Lobato , Eric Vanden-Eijnden

Conditional Distribution Compression via the Kernel Conditional Mean Embedding

Existing distribution compression methods, like Kernel Herding (KH), were originally developed for unlabelled data. However, no existing approach directly compresses the conditional distribution of \textit{labelled} data. To address this…

机器学习 · 统计学 2026-01-19 Dominic Broadbent , Nick Whiteley , Robert Allison , Tom Lovett

Universal Architectures for the Learning of Polyhedral Norms and Convex Regularizers

This paper addresses the task of learning convex regularizers to guide the reconstruction of images from limited data. By imposing that the reconstruction be amplitude-equivariant, we narrow down the class of admissible functionals to those…

机器学习 · 统计学 2026-01-19 Michael Unser , Stanislas Ducotterd

High-Dimensional Tail Index Regression

Motivated by the empirical observation of power-law distributions in the credits (e.g., ``likes'') of viral posts in social media, we introduce a high-dimensional tail index regression model and propose methods for estimation and inference…

机器学习 · 统计学 2026-01-19 Yuya Sasaki , Jing Tao , Yulong Wang

Classification Imbalance as Transfer Learning

Classification imbalance arises when one class is much rarer than the other. We frame this setting as transfer learning under label (prior) shift between an imbalanced source distribution induced by the observed data and a balanced target…

机器学习 · 统计学 2026-01-16 Eric Xia , Jason M. Klusowski

Parametric RDT approach to computational gap of symmetric binary perceptron

We study potential presence of statistical-computational gaps (SCG) in symmetric binary perceptrons (SBP) via a parametric utilization of \emph{fully lifted random duality theory} (fl-RDT) [96]. A structural change from decreasingly to…

机器学习 · 统计学 2026-01-16 Mihailo Stojnic