机器学习 — Scifaro

Statistical Learning for Heterogeneous Treatment Effects: Pretraining, Prognosis, and Prediction

Robust estimation of heterogeneous treatment effects is a fundamental challenge for optimal decision-making in domains ranging from personalized medicine to educational policy. In recent years, predictive machine learning has emerged as a…

机器学习 · 统计学 2025-06-23 Maximilian Schuessler , Erik Sverdrup , Robert Tibshirani

Neural Guided Diffusion Bridges

We propose a novel method for simulating conditioned diffusion processes (diffusion bridges) in Euclidean spaces. By training a neural network to approximate bridge dynamics, our approach eliminates the need for computationally intensive…

机器学习 · 统计学 2025-06-23 Gefan Yang , Frank van der Meulen , Stefan Sommer

Robust Score Matching

Proposed in Hyv\"arinen (2005), score matching is a parameter estimation procedure that does not require computation of distributional normalizing constants. In this work we utilize the geometric median of means to develop a robust score…

机器学习 · 统计学 2025-06-23 Richard Schwank , Andrew McCormack , Mathias Drton

Belted and Ensembled Neural Network for Linear and Nonlinear Sufficient Dimension Reduction

We introduce a unified, flexible, and easy-to-implement framework of sufficient dimension reduction that can accommodate both linear and nonlinear dimension reduction, and both the conditional distribution and the conditional mean as the…

机器学习 · 统计学 2025-06-23 Yin Tang , Bing Li

Modeling Epidemic Spread: A Gaussian Process Regression Approach

Modeling epidemic spread is critical for informing policy decisions aimed at mitigation. Accordingly, in this work we present a new data-driven method based on Gaussian process regression (GPR) to model epidemic spread through the…

机器学习 · 统计学 2025-06-23 Baike She , Lei Xin , Philip E. Paré , Matthew Hale

Time-dependent density estimation using binary classifiers

We propose a data-driven method to learn the time-dependent probability density of a multivariate stochastic process from sample paths, assuming that the initial probability density is known and can be evaluated. Our method uses a novel…

机器学习 · 统计学 2025-06-19 Agnimitra Dasgupta , Javier Murgoitio-Esandi , Ali Fardisi , Assad A Oberai

An Observation on Lloyd's k-Means Algorithm in High Dimensions

Clustering and estimating cluster means are core problems in statistics and machine learning, with k-means and Expectation Maximization (EM) being two widely used algorithms. In this work, we provide a theoretical explanation for the…

机器学习 · 统计学 2025-06-19 David Silva-Sánchez , Roy R. Lederman

Distributionally-Constrained Adversaries in Online Learning

There has been much recent interest in understanding the continuum from adversarial to stochastic settings in online learning, with various frameworks including smoothed settings proposed to bridge this gap. We consider the more general and…

机器学习 · 统计学 2025-06-19 Moïse Blanchard , Samory Kpotufe

Optimal Scheduling of Dynamic Transport

Flow-based methods for sampling and generative modeling use continuous-time dynamical systems to represent a {transport map} that pushes forward a source measure to a target measure. The introduction of a time axis provides considerable…

机器学习 · 统计学 2025-06-19 Panos Tsimpos , Zhi Ren , Jakob Zech , Youssef Marzouk

Sparsity-Based Interpolation of External, Internal and Swap Regret

Focusing on the expert problem in online learning, this paper studies the interpolation of several performance metrics via $\phi$-regret minimization, which measures the total loss of an algorithm by its regret with respect to an arbitrary…

机器学习 · 统计学 2025-06-19 Zhou Lu , Y. Jennifer Sun , Zhiyu Zhang

Sharp Generalization Bounds for Foundation Models with Asymmetric Randomized Low-Rank Adapters

Low-Rank Adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning (PEFT) technique for foundation models. Recent work has highlighted an inherent asymmetry in the initialization of LoRA's low-rank factors, which has…

机器学习 · 统计学 2025-06-18 Anastasis Kratsios , Tin Sum Cheng , Aurelien Lucchi , Haitz Sáez de Ocáriz Borde

Adaptive Data Augmentation for Thompson Sampling

In linear contextual bandits, the objective is to select actions that maximize cumulative rewards, modeled as a linear function with unknown parameters. Although Thompson Sampling performs well empirically, it does not achieve optimal…

机器学习 · 统计学 2025-06-18 Wonyoung Kim

Estimation of Treatment Effects in Extreme and Unobserved Data

Causal effect estimation seeks to determine the impact of an intervention from observational data. However, the existing causal inference literature primarily addresses treatment effects on frequently occurring events. But what if we are…

机器学习 · 统计学 2025-06-18 Jiyuan Tan , Jose Blanchet , Vasilis Syrgkanis

Mirror Descent Using the Tempesta Generalized Multi-parametric Logarithms

In this paper, we develop a wide class Mirror Descent (MD) algorithms, which play a key role in machine learning. For this purpose we formulated the constrained optimization problem, in which we exploits the Bregman divergence with the…

机器学习 · 统计学 2025-06-18 Andrzej Cichocki

Bridging Unsupervised and Semi-Supervised Anomaly Detection: A Theoretically-Grounded and Practical Framework with Synthetic Anomalies

Anomaly detection (AD) is a critical task across domains such as cybersecurity and healthcare. In the unsupervised setting, an effective and theoretically-grounded principle is to train classifiers to distinguish normal data from…

机器学习 · 统计学 2025-06-18 Matthew Lau , Tian-Yi Zhou , Xiangchi Yuan , Jizhou Chen , Wenke Lee , Xiaoming Huo

Meta Optimality for Demographic Parity Constrained Regression via Post-Processing

We address the regression problem under the constraint of demographic parity, a commonly used fairness definition. Recent studies have revealed fair minimax optimal regression algorithms, the most accurate algorithms that adhere to the…

机器学习 · 统计学 2025-06-18 Kazuto Fukuchi

Rademacher learning rates for iterated random functions

Most existing literature on supervised machine learning assumes that the training dataset is drawn from an i.i.d. sample. However, many real-world problems exhibit temporal dependence and strong correlations between the marginal…

机器学习 · 统计学 2025-06-18 Nikola Sandrić

Beyond Shapley Values: Cooperative Games for the Interpretation of Machine Learning Models

Cooperative game theory has become a cornerstone of post-hoc interpretability in machine learning, largely through the use of Shapley values. Yet, despite their widespread adoption, Shapley-based methods often rest on axiomatic…

机器学习 · 统计学 2025-06-18 Marouane Il Idrissi , Agathe Fernandes Machado , Arthur Charpentier

Experimental Design for Semiparametric Bandits

We study finite-armed semiparametric bandits, where each arm's reward combines a linear component with an unknown, potentially adversarial shift. This model strictly generalizes classical linear bandits and reflects complexities common in…

机器学习 · 统计学 2025-06-18 Seok-Jin Kim , Gi-Soo Kim , Min-hwan Oh

Spline Dimensional Decomposition with Interpolation-based Optimal Knot Selection for Stochastic Dynamic Analysis

Forward uncertainty quantification in dynamical systems is challenging due to non-smooth or locally oscillating nonlinear behaviors. Spline dimensional decomposition (SDD) addresses such nonlinearity by partitioning input coordinates via…

机器学习 · 统计学 2025-06-18 Yeonsu Kim , Junhan Lee , Bingran Wang , John T. Hwang , Dongjin Lee