机器学习 — Scifaro

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

It has become increasingly common for data to be collected adaptively, for example using contextual bandits. Historical data of this type can be used to evaluate other treatment assignment policies to guide future innovation or experiments.…

机器学习 · 统计学 2025-09-22 Ruohan Zhan , Vitor Hadad , David A. Hirshberg , Susan Athey

Asymptotic Study of In-context Learning with Random Transformers through Equivalent Models

We study the in-context learning (ICL) capabilities of pretrained Transformers in the setting of nonlinear regression. Specifically, we focus on a random Transformer with a nonlinear MLP head where the first layer is randomly initialized…

机器学习 · 统计学 2025-09-19 Samet Demir , Zafer Dogan

Next-Depth Lookahead Tree

This paper proposes the Next-Depth Lookahead Tree (NDLT), a single-tree model designed to improve performance by evaluating node splits not only at the node being optimized but also by evaluating the quality of the next depth level.

机器学习 · 统计学 2025-09-19 Jaeho Lee , Kangjin Kim , Gyeong Taek Lee

Benefits of Online Tilted Empirical Risk Minimization: A Case Study of Outlier Detection and Robust Regression

Empirical Risk Minimization (ERM) is a foundational framework for supervised learning but primarily optimizes average-case performance, often neglecting fairness and robustness considerations. Tilted Empirical Risk Minimization (TERM)…

机器学习 · 统计学 2025-09-19 Yigit E. Yildirim , Samet Demir , Zafer Dogan

Learning Rate Should Scale Inversely with High-Order Data Moments in High-Dimensional Online Independent Component Analysis

We investigate the impact of high-order moments on the learning dynamics of an online Independent Component Analysis (ICA) algorithm under a high-dimensional data model composed of a weighted sum of two non-Gaussian random variables. This…

机器学习 · 统计学 2025-09-19 M. Oguzhan Gultekin , Samet Demir , Zafer Dogan

Gap-Dependent Bounds for Federated $Q$-learning

We present the first gap-dependent analysis of regret and communication cost for on-policy federated $Q$-Learning in tabular episodic finite-horizon Markov decision processes (MDPs). Existing FRL methods focus on worst-case scenarios,…

机器学习 · 统计学 2025-09-19 Haochen Zhang , Zhong Zheng , Lingzhou Xue

Double Descent: Understanding Linear Model Estimation of Nonidentifiable Parameters and a Model for Overfitting

We consider ordinary least squares estimation and variations on least squares estimation such as penalized (regularized) least squares and spectral shrinkage estimates for problems with p > n and associated problems with prediction of new…

机器学习 · 统计学 2025-09-19 Ronald Christensen

On the Rate of Gaussian Approximation for Linear Regression Problems

In this paper, we consider the problem of Gaussian approximation for the online linear regression task. We derive the corresponding rates for the setting of a constant learning rate and study the explicit dependence of the convergence rate…

机器学习 · 统计学 2025-09-18 Marat Khusainov , Marina Sheshukova , Alain Durmus , Sergey Samsonov

Causal Clustering for Conditional Average Treatment Effects Estimation and Subgroup Discovery

Estimating heterogeneous treatment effects is critical in domains such as personalized medicine, resource allocation, and policy evaluation. A central challenge lies in identifying subpopulations that respond differently to interventions,…

机器学习 · 统计学 2025-09-18 Zilong Wang , Turgay Ayer , Shihao Yang

Continuous Temporal Learning of Probability Distributions via Neural ODEs with Applications in Continuous Glucose Monitoring Data

Modeling the dynamics of probability distributions from time-dependent data samples is a fundamental problem in many fields, including digital health. The goal is to analyze how the distribution of a biomarker, such as glucose, changes over…

机器学习 · 统计学 2025-09-18 Antonio Álvarez-López , Marcos Matabuena

High-Dimensional Gaussian Process Regression with Soft Kernel Interpolation

We introduce Soft Kernel Interpolation (SoftKI), a method that combines aspects of Structured Kernel Interpolation (SKI) and variational inducing point methods, to achieve scalable Gaussian Process (GP) regression on high-dimensional…

机器学习 · 统计学 2025-09-18 Chris Camaño , Daniel Huang

Dual Feature Reduction for the Sparse-group Lasso and its Adaptive Variant

The sparse-group lasso performs both variable and group selection, simultaneously using the strengths of the lasso and group lasso. It has found widespread use in genetics, a field that regularly involves the analysis of high-dimensional…

机器学习 · 统计学 2025-09-18 Fabio Feser , Marina Evangelou

SURGIN: SURrogate-guided Generative INversion for subsurface multiphase flow with quantified uncertainty

We present a direct inverse modeling method named SURGIN, a SURrogate-guided Generative INversion framework tailed for subsurface multiphase flow data assimilation. Unlike existing inversion methods that require adaptation for each new…

机器学习 · 统计学 2025-09-17 Zhao Feng , Bicheng Yan , Luanxiao Zhao , Xianda Shen , Renyu Zhao , Wenhao Wang , Fengshou Zhang

A Particle-Flow Algorithm for Free-Support Wasserstein Barycenters

The Wasserstein barycenter extends the Euclidean mean to the space of probability measures by minimizing the weighted sum of squared 2-Wasserstein distances. We develop a free-support algorithm for computing Wasserstein barycenters that…

机器学习 · 统计学 2025-09-17 Kisung You

Minimax optimal transfer learning for high-dimensional additive regression

This paper studies high-dimensional additive regression under the transfer learning framework, where one observes samples from a target population together with auxiliary samples from different but potentially related regression models. We…

机器学习 · 统计学 2025-09-17 Seung Hyun Moon

A Statistical Analysis of Deep Federated Learning for Intrinsically Low-dimensional Data

Despite significant research on the optimization aspects of federated learning, the exploration of generalization error, especially in the realm of heterogeneous federated learning, remains an area that has been insufficiently investigated,…

机器学习 · 统计学 2025-09-17 Saptarshi Chakraborty , Peter L. Bartlett

Multi-task and few-shot learning in virtual flow metering

Recent literature has explored various ways to improve soft sensors by utilizing learning algorithms with transferability. A performance gain is generally attained when knowledge is transferred among strongly related soft sensor learning…

机器学习 · 统计学 2025-09-17 Kristian Løvland , Bjarne Grimstad , Lars S. Imsland

The Morgan-Pitman Test of Equality of Variances and its Application to Machine Learning Model Evaluation and Selection

Model selection in non-linear models often prioritizes performance metrics over statistical tests, limiting the ability to account for sampling variability. We propose the use of a statistical test to assess the equality of variances in…

机器学习 · 统计学 2025-09-16 Argimiro Arratia , Alejandra Cabaña , Ernesto Mordecki , Gerard Rovira-Parra

MMM: Clustering Multivariate Longitudinal Mixed-type Data

Multivariate longitudinal data of mixed-type are increasingly collected in many science domains. However, algorithms to cluster this kind of data remain scarce, due to the challenge to simultaneously model the within- and between-time…

机器学习 · 统计学 2025-09-16 Francesco Amato , Julien Jacques

E-ROBOT: a dimension-free method for robust statistics and machine learning via Schr\"odinger bridge

We propose the Entropic-regularized Robust Optimal Transport (E-ROBOT) framework, a novel method that combines the robustness of ROBOT with the computational and statistical benefits of entropic regularization. We show that, rooted in the…

机器学习 · 统计学 2025-09-16 Davide La Vecchia , Hang Liu