机器学习 — Scifaro

Unified Taxonomy for Multivariate Time Series Anomaly Detection using Deep Learning

The topic of Multivariate Time Series Anomaly Detection (MTSAD) has grown rapidly over the past years, with a steady rise in publications and Deep Learning (DL) models becoming the dominant paradigm. To address the lack of systematization…

机器学习 · 统计学 2026-04-27 Bruna Alves , Armando J. Pinho , Sónia Gouveia

Calibrated Principal Component Regression

We propose a new method for statistical inference in generalized linear models. In the overparameterized regime, Principal Component Regression (PCR) reduces variance by projecting high-dimensional data to a low-dimensional principal…

机器学习 · 统计学 2026-04-27 Yixuan Florence Wu , Yilun Zhu , Lei Cao , Naichen Shi

Online Multivariate Regularized Distributional Regression for High-dimensional Probabilistic Electricity Price Forecasting

Probabilistic electricity price forecasting (PEPF) is vital for short-term electricity markets, yet the multivariate nature of day-ahead prices - spanning 24 consecutive hours - remains underexplored. At the same time, real-time…

机器学习 · 统计学 2026-04-27 Simon Hirsch

On Pareto Optimality for Parametric Choice Bandits

We study online assortment optimization under stochastic choice when a decision maker simultaneously values cumulative revenue performance and the quality of post-hoc inference on revenue contrasts. We analyze a forced-exploration…

机器学习 · 统计学 2026-04-27 Jierui Zuo , Hanzhang Qin

Online Distributional Regression

Large-scale streaming data are common in modern machine learning applications and have led to the development of online learning algorithms. Many fields, such as supply chain management, weather and meteorology, energy markets, and finance,…

机器学习 · 统计学 2026-04-27 Simon Hirsch , Jonathan Berrisch , Florian Ziel

Machine Learning Construction: implications to cybersecurity

Statistical learning is the process of estimating an unknown probabilistic input-output relationship of a system using a limited number of observations. A statistical learning machine (SLM) is the algorithm, function, model, or rule, that…

机器学习 · 统计学 2026-04-26 Waleed A. Yousef

Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors

Geographic context is often consider relevant to motor insurance risk, yet public actuarial datasets provide limited location identifiers, constraining how this information can be incorporated and evaluated in claim-frequency models. This…

机器学习 · 统计学 2026-04-24 Sherly Alfonso-Sánchez , Cristián Bravo , Kristina G. Stankova

Beyond Expected Information Gain: Stable Bayesian Optimal Experimental Design with Integral Probability Metrics and Plug-and-Play Extensions

Bayesian Optimal Experimental Design (BOED) provides a rigorous framework for decision-making tasks in which data acquisition is often the critical bottleneck, especially in resource-constrained settings. Traditionally, BOED typically…

机器学习 · 统计学 2026-04-24 Di Wu , Ling Liang , Haizhao Yang

There Will Be a Scientific Theory of Deep Learning

In this paper, we make the case that a scientific theory of deep learning is emerging. By this we mean a theory which characterizes important properties and statistics of the training process, hidden representations, final weights, and…

机器学习 · 统计学 2026-04-24 Jamie Simon , Daniel Kunin , Alexander Atanasov , Enric Boix-Adserà , Blake Bordelon , Jeremy Cohen , Nikhil Ghosh , Florentin Guth , Arthur Jacot , Mason Kamb , Dhruva Karkada , Eric J. Michaud , Berkan Ottlik , Joseph Turnbull

A Kernel Nonconformity Score for Multivariate Conformal Prediction

Multivariate conformal prediction requires nonconformity scores that compress residual vectors into scalars while preserving certain implicit geometric structure of the residual distribution. We introduce a Multivariate Kernel Score (MKS)…

机器学习 · 统计学 2026-04-24 Louis Meyer , Wenkai Xu

A single algorithm for both restless and rested rotting bandits

In many application domains (e.g., recommender systems, intelligent tutoring systems), the rewards associated to the actions tend to decrease over time. This decay is either caused by the actions executed in the past (e.g., a user may get…

机器学习 · 统计学 2026-04-24 Julien Seznec , Pierre Ménard , Alessandro Lazaric , Michal Valko

CLT-Optimal Parameter Error Bounds for Linear System Identification

There has been remarkable progress over the past decade in establishing finite-sample, non-asymptotic bounds on recovering unknown system parameters from observed system behavior. Surprisingly, however, we show that the current…

机器学习 · 统计学 2026-04-24 Yichen Zhou , Stephen Tu

Calibeating Prediction-Powered Inference

We study semisupervised mean estimation with a small labeled sample, a large unlabeled sample, and a black-box prediction model whose output may be miscalibrated. A standard approach in this setting is augmented inverse-probability…

机器学习 · 统计学 2026-04-24 Lars van der Laan , Mark Van Der Laan

Refining Covariance Matrix Estimation in Stochastic Gradient Descent Through Bias Reduction

We study online inference and asymptotic covariance estimation for the stochastic gradient descent (SGD) algorithm. While classical methods (such as plug-in and batch-means estimators) are available, they either require inaccessible…

机器学习 · 统计学 2026-04-24 Ziyang Wei , Wanrong Zhu , Jingyang Lyu , Wei Biao Wu

Learning to Emulate Chaos: Adversarial Optimal Transport Regularization

Chaos arises in many complex dynamical systems, from weather to power grids, but is difficult to accurately model using data-driven emulators, including neural operator architectures. For chaotic systems, the inherent sensitivity to initial…

机器学习 · 统计学 2026-04-24 Gabriel Melo , Leonardo Santiago , Peter Y. Lu

Achieving the Kesten-Stigum bound in the non-uniform hypergraph stochastic block model

We study the community detection problem in the non-uniform hypergraph stochastic block model (HSBM), where hyperedges of varying sizes coexist. This setting captures higher-order and multi-view interactions and raises a fundamental…

机器学习 · 统计学 2026-04-24 Manuel Fernandez , Ludovic Stephan , Yizhe Zhu

Decentralized Machine Learning with Centralized Performance Guarantees via Gibbs Algorithms

In this paper, it is shown, for the first time, that centralized performance is achievable in decentralized learning without sharing the local datasets. Specifically, when clients adopt an empirical risk minimization with relative-entropy…

机器学习 · 统计学 2026-04-24 Yaiza Bermudez , Samir M. Perlaza , Iñaki Esnaola

PDGMM-VAE: A Variational Autoencoder with Adaptive Per-Dimension Gaussian Mixture Model Priors for Nonlinear ICA

Independent component analysis is a core framework within blind source separation for recovering latent source signals from observed mixtures under statistical independence assumptions. In this work, we propose PDGMM-VAE, a source-oriented…

机器学习 · 统计学 2026-04-24 Yuan-Hao Wei , Yan-Jie Sun

Spatio-temporal probabilistic forecast using MMAF-guided learning

We present a theory-guided generalized Bayesian methodology for spatio-temporal raster data, which we use to train an ensemble of stochastic feed-forward neural networks with Gaussian-distributed weights. The methodology incorporates the…

机器学习 · 统计学 2026-04-24 Leonardo Bardi , Imma Valentina Curato , Lorenzo Proietti

Generalization Properties of Score-matching Diffusion Models for Intrinsically Low-dimensional Data

Despite the remarkable empirical success of score-based diffusion models, their statistical guarantees remain underdeveloped. Existing analyses often provide pessimistic convergence rates that do not reflect the intrinsic low-dimensional…

机器学习 · 统计学 2026-04-24 Saptarshi Chakraborty , Quentin Berthet , Peter L. Bartlett