机器学习 — Scifaro

Signature Kernel Scoring Rule: A Spatio-Temporal Diagnostic for Probabilistic Weather Forecasting

Modern weather forecasting has increasingly transitioned from numerical weather prediction (NWP) to data-driven machine learning forecasting techniques. While these new models produce probabilistic forecasts to quantify uncertainty, their…

机器学习 · 统计学 2026-05-01 Archer Dodson , Ritabrata Dutta

Contextual Online Uncertainty-Aware Preference Learning for Human Feedback

Reinforcement Learning from Human Feedback (RLHF) has become a pivotal paradigm in artificial intelligence to align large models with human preferences. In this paper, we propose a novel statistical framework to simultaneously conduct the…

机器学习 · 统计学 2026-05-01 Nan Lu , Ethan Lee , Ethan X. Fang , Junwei Lu

Foreclassing: A new machine learning perspective on human decision making with temporal data

Time series forecasts are widely used to inform decisions. Human decision-makers interpret these forecasts, incorporate prior experience and uncertainty about future outcomes, and then make a decision. In this paper, we propose a new…

机器学习 · 统计学 2026-05-01 Daniel Andrew Coulson , Martin T. Wells

The Polynomial Stein Discrepancy for Assessing Moment Convergence

We propose a novel method for measuring the discrepancy between a set of samples and a desired posterior distribution for Bayesian inference. Classical methods for assessing sample quality like the effective sample size are not appropriate…

机器学习 · 统计学 2026-05-01 Narayan Srinivasan , Matthew Sutton , Christopher Drovandi , Leah F South

Modeling Spatial Extremal Dependence of Precipitation Using Distributional Neural Networks

In this work, we propose a simulation-based estimation approach using generative neural networks to determine dependencies of precipitation maxima and their underlying uncertainty in time and space. Within the common framework of max-stable…

机器学习 · 统计学 2026-05-01 Christopher Bülte , Lisa Leimenstoll , Melanie Schienle

Laplace Approximation for Bayesian Tensor Network Kernel Machines

Uncertainty estimation is essential for robust decision-making in the presence of ambiguous or out-of-distribution inputs. Gaussian Processes (GPs) are classical kernel-based models that offer principled uncertainty quantification and…

机器学习 · 统计学 2026-04-30 Albert Saiapin , Kim Batselier

Deep-testing: the case of dependence detection

Deep learning methods have proved highly effective for classification and image recognition problems. In this paper, we ask whether this success can be transferred to hypothesis testing: if a neural network can distinguish, for example, an…

机器学习 · 统计学 2026-04-30 Gery Geenens , Pierre Lafaye de Micheaux , Ivan Muyun Zou

Probabilistic data quality assessment for structural monitoring data via outlier-resistant conditional diffusion model

Data quality assessment is an essential step that ensures the reliability of the subsequent structural health monitoring (SHM) tasks. This study proposes a prediction deviation-based SHM data quality assessment method using a univariate…

机器学习 · 统计学 2026-04-30 Qi Li , Yong Huang , Hui Li

Robust Representation Learning through Explicit Environment Modeling

We consider learning from labeled data collected across multiple environments, where the data distribution may vary across these environments. This problem is commonly approached from a causal perspective, seeking invariant representations…

机器学习 · 统计学 2026-04-30 Yuli Slavutsky , David M. Blei

Occam's Razor is Only as Sharp as Your ELBO

The marginal likelihood, also known as the evidence, is regarded as a mathematical embodiment of Occam's razor, enabling model selection that avoids overfitting. The evidence lower bound (ELBO) objective from variational inference has also…

机器学习 · 统计学 2026-04-30 Ethan Harvey , Michael C. Hughes

Adversarial Robustness of NTK Neural Networks

Deep learning models are widely deployed in safety-critical domains, but remain vulnerable to adversarial attacks. In this paper, we study the adversarial robustness of NTK neural networks in the context of nonparametric regression. We…

机器学习 · 统计学 2026-04-30 Yuxuan Hou

Probabilistic Graphical Model using Graph Neural Networks for Bayesian Inversion of Discrete Structural Component States

The health condition of components in civil infrastructures can be described by various discrete states according to their performance degradation. Inferring these states from measurable responses is typically an ill-posed inverse problem.…

机器学习 · 统计学 2026-04-30 Teng Li , Stephen Wu , Yong Huang , James L. Beck , Hui Li

Concave Statistical Utility Maximization Bandits via Influence-Function Gradients

We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon problem…

机器学习 · 统计学 2026-04-30 Matías Carrasco , Alejandro Cholaquidis

A Tutorial Review of Bayesian Optimization with Gaussian Processes to Accelerate Stationary Point Searches

Building local surrogates to accelerate stationary point searches on potential energy surfaces spans decades of effort. Done correctly, surrogates can reduce the number of expensive electronic structure evaluations by roughly an order of…

机器学习 · 统计学 2026-04-30 Rohit Goswami

NeuralFLoC: Neural Flow-Based Joint Registration and Clustering of Functional Data

Clustering functional data in the presence of phase variation is challenging, as temporal misalignment can obscure intrinsic shape differences and degrade clustering performance. Most existing approaches treat registration and clustering as…

机器学习 · 统计学 2026-04-30 Xinyang Xiong , Siyuan jiang , Pengcheng Zeng

Optimal differentially private kernel learning with random projection

Differential privacy has become a cornerstone in the development of privacy-preserving learning algorithms. This work addresses optimizing differentially private kernel learning within the empirical risk minimization (ERM) framework. We…

机器学习 · 统计学 2026-04-30 Bonwoo Lee , Cheolwoo Park , Jeongyoun Ahn

Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective

The transformer's remarkable ability to perform in-context learning (ICL) has sparked a wide range of studies designed to understand its strengths and limitations. However, a theoretical understanding of when ICL can and cannot generalize…

机器学习 · 统计学 2026-04-30 Soo Min Kwon , Alec S. Xu , Can Yaras , Laura Balzano , Qing Qu

Data Balancing Strategies: A Systematic Survey of Resampling and Augmentation Methods

Imbalanced datasets, where one class significantly outnumbers others, remain a persistent challenge in machine learning, often biasing predictions toward the majority class and degrading classifier performance. This paper provides a…

机器学习 · 统计学 2026-04-30 Behnam Yousefimehr , Mehdi Ghatee , Javad Fazli , Shervin Ghaffari , Zahra Rafei , Mohammad Amin Seifi , Sajed Tavakoli , Abolfazl Nikahd , Mahdi Razi Gandomani , Alireza Orouji , Ramtin Mahmoudi Kashani , Sarina Heshmati , Negin Sadat Mousavi

DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing

In recent years, the growth of data across various sectors, including healthcare, security, finance, and education, has created significant opportunities for analysis and informed decision-making. However, these datasets often contain…

机器学习 · 统计学 2026-04-30 Utsab Saha , Tanvir Muntakim Tonoy , Hafiz Imtiaz

Distribution-Free Stochastic Analysis and Robust Multilevel Vector Field Anomaly Detection

Massive vector field datasets are common in multi-spectral optical and radar sensors, among many other emerging areas of application. We develop a novel stochastic functional (data) analysis approach for detecting anomalies based on the…

机器学习 · 统计学 2026-04-30 Julio E Castrillon-Candas , Michael Rosenbaum , Mark Kon