机器学习 — Scifaro

Diffusion Models for Time Series Forecasting: A Survey

Diffusion models, initially developed for image synthesis, demonstrate remarkable generative capabilities. Recently, their application has expanded to time series forecasting (TSF), yielding promising results. Existing surveys on time…

机器学习 · 统计学 2025-09-03 Chen Su , Zhengzhou Cai , Yuanhe Tian , Zhuochao Chang , Zihong Zheng , Yan Song

A Generalization Theory for Zero-Shot Prediction

A modern paradigm for generalization in machine learning and AI consists of pre-training a task-agnostic foundation model, generally obtained using self-supervised and multimodal contrastive learning. The resulting representations can be…

机器学习 · 统计学 2025-09-03 Ronak Mehta , Zaid Harchaoui

Gradient-free stochastic optimization for additive models

We address the problem of zero-order optimization from noisy observations for an objective function satisfying the Polyak-{\L}ojasiewicz or the strong convexity condition. Additionally, we assume that the objective function has an additive…

机器学习 · 统计学 2025-09-03 Arya Akhavan , Alexandre B. Tsybakov

Two-Sided Nearest Neighbors: An adaptive and minimax optimal procedure for matrix completion

Nearest neighbor (NN) algorithms have been extensively used for missing data problems in recommender systems and sequential decision-making systems. Prior theoretical analysis has established favorable guarantees for NN when the underlying…

机器学习 · 统计学 2025-09-03 Tathagata Sadhukhan , Manit Paul , Raaz Dwivedi

ODTlearn: A Package for Learning Optimal Decision Trees for Prediction and Prescription

ODTLearn is an open-source Python package that provides methods for learning optimal decision trees for high-stakes predictive and prescriptive tasks based on the mixed-integer optimization (MIO) framework proposed in (Aghaei et al., 2021)…

机器学习 · 统计学 2025-09-03 Patrick Vossler , Sina Aghaei , Nathan Justin , Nathanael Jo , Andrés Gómez , Phebe Vayanos

A Flexible Framework for Incorporating Patient Preferences Into Q-Learning

In real-world healthcare settings, treatment decisions often involve optimizing for multivariate outcomes such as treatment efficacy and severity of side effects based on individual preferences. However, existing statistical methods for…

机器学习 · 统计学 2025-09-03 Joshua P. Zitovsky , Yating Zou , Leslie Wilson , Michael R. Kosorok

Extending Model-x Framework to Missing Data

One limitation of the most statistical/machine learning-based variable selection approaches is their inability to control the false selections. A recently introduced framework, model-x knockoffs, provides that to a wide range of models but…

机器学习 · 统计学 2025-09-03 Deniz Koyuncu , Alex Gittens , Bülent Yener

Adaptive generative moment matching networks for improved learning of dependence structures

An adaptive bandwidth selection procedure for the mixture kernel in the maximum mean discrepancy (MMD) for fitting generative moment matching networks (GMMNs) is introduced, and its ability to improve the learning of copula random number…

机器学习 · 统计学 2025-09-01 Marius Hofert , Gan Yao

Weighted Support Points from Random Measures: An Interpretable Alternative for Generative Modeling

Support points summarize a large dataset through a smaller set of representative points that can be used for data operations, such as Monte Carlo integration, without requiring access to the full dataset. In this sense, support points offer…

机器学习 · 统计学 2025-09-01 Peiqi Zhao , Carlos E. Rodríguez , Ramsés H. Mena , Stephen G. Walker

From stability of Langevin diffusion to convergence of proximal MCMC for non-log-concave sampling

We consider the problem of sampling distributions stemming from non-convex potentials with Unadjusted Langevin Algorithm (ULA). We prove the stability of the discrete-time ULA to drift approximations under the assumption that the potential…

机器学习 · 统计学 2025-09-01 Marien Renaud , Valentin De Bortoli , Arthur Leclaire , Nicolas Papadakis

Effective Method for Inverse Ising Problem under Missing Observations in Restricted Boltzmann Machines

Restricted Boltzmann machines (RBMs) are energy-based models analogous to the Ising model and are widely applied in statistical machine learning. The standard inverse Ising problem with a complete dataset requires computing both data and…

机器学习 · 统计学 2025-09-01 Kaiji Sekimoto , Muneki Yasuda

Learning covariate importance for matching in policy-relevant observational research

Matching methods are widely used to reduce confounding effects in observational studies, but conventional approaches often treat all covariates as equally important, which can result in poor performance when covariates differ in their…

机器学习 · 统计学 2025-09-01 Hongzhe Zhang , Jiasheng Shi , Jing Huang

Guaranteed Nonconvex Factorization Approach for Tensor Train Recovery

In this paper, we provide the first convergence guarantee for the factorization approach. Specifically, to avoid the scaling ambiguity and to facilitate theoretical analysis, we optimize over the so-called left-orthogonal TT format which…

机器学习 · 统计学 2025-09-01 Zhen Qin , Michael B. Wakin , Zhihui Zhu

Transfer Learning for Classification under Decision Rule Drift with Application to Optimal Individualized Treatment Rule Estimation

In this paper, we extend the transfer learning classification framework from regression function-based methods to decision rules. We propose a novel methodology for modeling posterior drift through Bayes decision rules. By exploiting the…

机器学习 · 统计学 2025-08-29 Xiaohan Wang , Yang Ning

Polynomial Chaos Expansion for Operator Learning

Operator learning (OL) has emerged as a powerful tool in scientific machine learning (SciML) for approximating mappings between infinite-dimensional functional spaces. One of its main applications is learning the solution operator of…

机器学习 · 统计学 2025-08-29 Himanshu Sharma , Lukáš Novák , Michael D. Shields

Stochastic Gradients under Nuisances

Stochastic gradient optimization is the dominant learning paradigm for a variety of scenarios, from classical supervised learning to modern self-supervised learning. We consider stochastic gradient algorithms for learning problems whose…

机器学习 · 统计学 2025-08-29 Facheng Yu , Ronak Mehta , Alex Luedtke , Zaid Harchaoui

Canonical Bayesian Linear System Identification

Standard Bayesian approaches for linear time-invariant (LTI) system identification are hindered by parameter non-identifiability; the resulting complex, multi-modal posteriors make inference inefficient and impractical. We solve this…

机器学习 · 统计学 2025-08-29 Andrey Bryutkin , Matthew E. Levine , Iñigo Urteaga , Youssef Marzouk

Random Feature Representation Boosting

We introduce Random Feature Representation Boosting (RFRBoost), a novel method for constructing deep residual random feature neural networks (RFNNs) using boosting theory. RFRBoost uses random features at each layer to learn the functional…

机器学习 · 统计学 2025-08-29 Nikita Zozoulenko , Thomas Cass , Lukas Gonon

Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments

A/B testing has become the gold standard for policy evaluation in modern technological industries. Motivated by the widespread use of switchback experiments in A/B testing, this paper conducts a comprehensive comparative analysis of various…

机器学习 · 统计学 2025-08-29 Qianglin Wen , Chengchun Shi , Ying Yang , Niansheng Tang , Hongtu Zhu

Conditional Normalizing Flow Surrogate for Monte Carlo Prediction of Radiative Properties in Nanoparticle-Embedded Layers

We present a probabilistic, data-driven surrogate model for predicting the radiative properties of nanoparticle embedded scattering media. The model uses conditional normalizing flows, which learn the conditional distribution of optical…

机器学习 · 统计学 2025-08-28 Fahime Seyedheydari , Kevin Conley , Simo Särkkä