机器学习 — Scifaro

Hybrid Generative Modeling for Incomplete Physics: Deep Grey-Box Meets Optimal Transport

Physics phenomena are often described by ordinary and/or partial differential equations (ODEs/PDEs), and solved analytically or numerically. Unfortunately, many real-world systems are described only approximately with missing or unknown…

机器学习 · 统计学 2025-06-30 Gurjeet Sangra Singh , Maciej Falkiewicz , Alexandros Kalousis

Classification with Reject Option: Distribution-free Error Guarantees via Conformal Prediction

Machine learning (ML) models always make a prediction, even when they are likely to be wrong. This causes problems in practical applications, as we do not know if we should trust a prediction. ML with reject option addresses this issue by…

机器学习 · 统计学 2025-06-30 Johan Hallberg Szabadváry , Tuwe Löfström , Ulf Johansson , Cecilia Sönströd , Ernst Ahlberg , Lars Carlsson

Modification of a Numerical Method Using FIR Filters in a Time-dependent SIR Model for COVID-19

Authors Yi-Cheng Chen, Ping-En Lu, Cheng-Shang Chang, and Tzu-Hsuan Liu use the Finite Impulse Response (FIR) linear system filtering method to track and predict the number of people infected and recovered from COVID-19, in a pandemic…

机器学习 · 统计学 2025-06-30 Felipe Rogério Pimentel , Rafael Gustavo Alves

Performance of Rank-One Tensor Approximation on Incomplete Data

We are interested in the estimation of a rank-one tensor signal when only a portion $\varepsilon$ of its noisy observation is available. We show that the study of this problem can be reduced to that of a random matrix model whose spectral…

机器学习 · 统计学 2025-06-30 Hugo Lebeau

Computational Efficient and Minimax Optimal Nonignorable Matrix Completion

While the matrix completion problem has attracted considerable attention over the decades, few works address the nonignorable missing issue and all have their limitations. In this article, we propose a nuclear norm regularized row- and…

机器学习 · 统计学 2025-06-30 Yuanhong A , Guoyu Zhang , Yongcheng Zeng , Bo Zhang

Learning Data-Driven Uncertainty Set Partitions for Robust and Adaptive Energy Forecasting with Missing Data

Short-term forecasting models typically assume the availability of input data (features) when they are deployed and in use. However, equipment failures, disruptions, cyberattacks, may lead to missing features when such models are used…

机器学习 · 统计学 2025-06-30 Akylas Stratigakos , Panagiotis Andrianesis

Green LIME: Improving AI Explainability through Design of Experiments

In artificial intelligence (AI), the complexity of many models and processes surpasses human understanding, making it challenging to determine why a specific prediction is made. This lack of transparency is particularly problematic in…

机器学习 · 统计学 2025-06-30 Alexandra Stadler , Werner G. Müller , Radoslav Harman

Learning Networks from Wide-Sense Stationary Stochastic Processes

Complex networked systems driven by latent inputs are common in fields like neuroscience, finance, and engineering. A key inference problem here is to learn edge connectivity from node outputs (potentials). We focus on systems governed by…

机器学习 · 统计学 2025-06-30 Anirudh Rayas , Jiajun Cheng , Rajasekhar Anguluri , Deepjyoti Deka , Gautam Dasarathy

Convergence of flow-based generative models via proximal gradient descent in Wasserstein space

Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based…

机器学习 · 统计学 2025-06-30 Xiuyuan Cheng , Jianfeng Lu , Yixin Tan , Yao Xie

Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings

Reinforcement learning is a general technique that allows an agent to learn an optimal policy and interact with an environment in sequential decision making problems. The goodness of a policy is measured by its value function starting from…

机器学习 · 统计学 2025-06-30 C. Shi , S. Zhang , W. Lu , R. Song

Gaussian Invariant Markov Chain Monte Carlo

We develop sampling methods, which consist of Gaussian invariant versions of random walk Metropolis (RWM), Metropolis adjusted Langevin algorithm (MALA) and second order Hessian or Manifold MALA. Unlike standard RWM and MALA we show that…

机器学习 · 统计学 2025-06-27 Michalis K. Titsias , Angelos Alexopoulos , Siran Liu , Petros Dellaportas

Homogenization of Multi-agent Learning Dynamics in Finite-state Markov Games

This paper introduces a new approach for approximating the learning dynamics of multiple reinforcement learning (RL) agents interacting in a finite-state Markov game. The idea is to rescale the learning process by simultaneously reducing…

机器学习 · 统计学 2025-06-27 Yann Kerzreho

Forecasting Geopolitical Events with a Sparse Temporal Fusion Transformer and Gaussian Process Hybrid: A Case Study in Middle Eastern and U.S. Conflict Dynamics

Forecasting geopolitical conflict from data sources like the Global Database of Events, Language, and Tone (GDELT) is a critical challenge for national security. The inherent sparsity, burstiness, and overdispersion of such data cause…

机器学习 · 统计学 2025-06-27 Hsin-Hsiung Huang , Hayden Hampton

Sharp concentration of uniform generalization errors in binary linear classification

We examine the concentration of uniform generalization errors around their expectation in binary linear classification problems via an isoperimetric argument. In particular, we establish Poincar\'{e} and log-Sobolev inequalities for the…

机器学习 · 统计学 2025-06-27 Shogo Nakakita

SDE Matching: Scalable and Simulation-Free Training of Latent Stochastic Differential Equations

The Latent Stochastic Differential Equation (SDE) is a powerful tool for time series and sequence modeling. However, training Latent SDEs typically relies on adjoint sensitivity methods, which depend on simulation and backpropagation…

机器学习 · 统计学 2025-06-27 Grigory Bartosh , Dmitry Vetrov , Christian A. Naesseth

High-dimensional Contextual Bandit Problem without Sparsity

In this research, we investigate the high-dimensional linear contextual bandit problem where the number of features $p$ is greater than the budget $T$, or it may even be infinite. Differing from the majority of previous works in this field,…

机器学习 · 统计学 2025-06-27 Junpei Komiyama , Masaaki Imaizumi

Valid Selection among Conformal Sets

Conformal prediction offers a distribution-free framework for constructing prediction sets with coverage guarantees. In practice, multiple valid conformal prediction sets may be available, arising from different models or methodologies.…

机器学习 · 统计学 2025-06-26 Mahmoud Hegazy , Liviu Aolaritei , Michael I. Jordan , Aymeric Dieuleveut

DRO-Augment Framework: Robustness by Synergizing Wasserstein Distributionally Robust Optimization and Data Augmentation

In many real-world applications, ensuring the robustness and stability of deep neural networks (DNNs) is crucial, particularly for image classification tasks that encounter various input perturbations. While data augmentation techniques…

机器学习 · 统计学 2025-06-26 Jiaming Hu , Debarghya Mukherjee , Ioannis Ch. Paschalidis

Scalable Machine Learning Algorithms using Path Signatures

The interface between stochastic analysis and machine learning is a rapidly evolving field, with path signatures - iterated integrals that provide faithful, hierarchical representations of paths - offering a principled and universal feature…

机器学习 · 统计学 2025-06-26 Csaba Tóth

Identifying Heterogeneity in Distributed Learning

We study methods for identifying heterogeneous parameter components in distributed M-estimation with minimal data transmission. One is based on a re-normalized Wald test, which is shown to be consistent as long as the number of distributed…

机器学习 · 统计学 2025-06-26 Zelin Xiao , Jia Gu , Song Xi Chen