机器学习 — Scifaro

Hypergraph Generation via Structured Stochastic Diffusion

Hypergraphs model higher-order interactions, but realistic hypergraph generation remains difficult because incidence, hyperedge-size heterogeneity, and overlap structure are not faithfully captured by pairwise reductions. We propose \HEDGE,…

机器学习 · 统计学 2026-05-07 Christopher Nemeth

Scalable inference of spatial regions and temporal signatures from time series

Regionalization aims to partition a spatial domain into contiguous regions that share similar characteristics, enabling more effective spatial analysis, policy making, and resource management. Existing approaches for spatial regionalization…

机器学习 · 统计学 2026-05-07 Jiayu Weng , Alec Kirkley

Perturbation is All You Need for Extrapolating Language Models

We introduce a simple yet powerful framework for training large language models. In contrast to the standard autoregressive next-token prediction based on an exact prefix, we propose a perturbation-based procedure that first transforms the…

机器学习 · 统计学 2026-05-07 Zetai Cen , Jin Zhu , Xinwei Shen , Chengchun Shi

Adapt or Forget: Provable Tradeoffs Between Adam and SGD in Nonstationary Optimization

We provide a theoretical analysis of Adam under non-stationary stochastic objectives, separating two regimes: Euclidean tracking under adaptive strong monotonicity of the Adam-preconditioned mean-gradient operator, and high-probability…

机器学习 · 统计学 2026-05-07 Sharan Sahu , Abir Sarkar , Cameron J. Hogan , Martin T. Wells

Entropic Riemannian Neural Optimal Transport

Many machine learning problems involve data supported on curved spaces such as spheres, rotation groups, hyperbolic spaces, and general Riemannian manifolds, where Euclidean geometry can distort distances, averages, and the resulting…

机器学习 · 统计学 2026-05-07 Alessandro Micheli , Silvia Sapora , Anthea Monod , Samir Bhatt

Heterogeneous Ordinal Structure Learning with Bayesian Nonparametric Complexity Discovery

Public attitudes toward artificial intelligence are heterogeneous, ordinally measured, and poorly captured by any single dependency graph. Existing ordinal structure learners assume a shared directed acyclic graph (DAG) across all…

机器学习 · 统计学 2026-05-07 Amir Rafe , Subasish Das

A Consistency-Centric Approach to Set-Based Optimization with Multiple Models of Unranked Fidelity

In complex real-world settings, optimization is challenged by the presence of diverse models of differing fidelity. In many optimization problems, a single model is treated as the most accurate representation of the underlying system, while…

机器学习 · 统计学 2026-05-07 Danielle F. Morey , Giulia Pedrielli , Cherry Y. Wakayama , Zelda B. Zabinsky

When LLMs get significantly worse: A statistical approach to detect model degradations

Minimizing the inference cost and latency of foundation models has become a crucial area of research. Optimization approaches include theoretically lossless methods and others without accuracy guarantees like quantization. In all of these…

机器学习 · 统计学 2026-05-07 Jonas Kübler , Kailash Budhathoki , Matthäus Kleindessner , Xiong Zhou , Junming Yin , Ashish Khetan , George Karypis

Multivariate Time Series Data Imputation via Distributionally Robust Regularization

Multivariate time series imputation is often compromised by mismatch between the observed and true data distributions, a bias induced by the combined effects of time-series non-stationarity and systematic missingness. Standard methods that…

机器学习 · 统计学 2026-05-07 Che-Yi Liao , Zheng Dong , Gian-Gabriel Garcia , Kamran Paynabar

Learning Time-Varying Graphs from Incomplete Graph Signals

This paper tackles the challenging problem of jointly inferring time-varying network topologies and imputing missing data from partially observed graph signals. We propose a unified non-convex optimization framework to simultaneously…

机器学习 · 统计学 2026-05-07 Chuansen Peng , Xiaojing Shen

Echoes of the Past: A Unified Perspective on Fading memory and Echo States

Recurrent neural networks (RNNs) have become increasingly popular in information processing tasks involving time series and temporal data. A fundamental property of RNNs is their ability to create reliable input/output responses, often…

机器学习 · 统计学 2026-05-07 Juan-Pablo Ortega , Florian Rossmannek

Scalable Policy Maximization Under Network Interference

Many interventions, such as vaccines in clinical trials or coupons in online marketplaces, must be assigned sequentially without full knowledge of their effects. Multi-armed bandit algorithms have proven successful in such settings.…

机器学习 · 统计学 2026-05-07 Aidan Gleich , Eric Laber , Alexander Volfovsky

Improving Bias Correction Standards by Quantifying its Effects on Treatment Outcomes

With the growing access to administrative health databases, retrospective studies have become crucial evidence for medical treatments. Yet, non-randomized studies frequently face selection biases, requiring mitigation strategies. Propensity…

机器学习 · 统计学 2026-05-07 Alexandre Abraham , Andrés Hoyos Idrobo

An Unconditional Representation of the Conditional Score in Infinite-Dimensional Linear Inverse Problems

Score-based diffusion models (SDMs) have emerged as a powerful tool for sampling from the posterior distribution in Bayesian inverse problems. However, existing methods often require multiple evaluations of the forward mapping to generate a…

机器学习 · 统计学 2026-05-07 Fabian Schneider , Duc-Lam Duong , Matti Lassas , Maarten V. de Hoop , Tapio Helin

Conditional Diffusion Sampling

Sampling from unnormalized multimodal distributions with limited density evaluations remains a fundamental challenge in machine learning and natural sciences. Successful approaches construct a bridge between a tractable reference and the…

机器学习 · 统计学 2026-05-06 Francisco M. Castro-Macías , Pablo Morales-Álvarez , Saifuddin Syed , Daniel Hernández-Lobato , Rafael Molina , José Miguel Hernández-Lobato

The Manokhin Probability Matrix: A Diagnostic Framework for Classifier Probability Quality

The Brier score conflates two distinct properties of probabilistic predictions: reliability (calibration error) and resolution (discriminatory power). We introduce the Manokhin Probability Matrix, a BCG-style two-dimensional diagnostic…

机器学习 · 统计学 2026-05-06 Valery Manokhin

Training-Free Probabilistic Time-Series Forecasting with Conformal Seasonal Pools

We propose Conformal Seasonal Pools (CSP), a training-free probabilistic time-series forecaster that mixes same-season empirical draws with signed residual draws around a seasonal naive forecast. In an audited rolling-origin benchmark on…

机器学习 · 统计学 2026-05-06 Valery Manokhin

Low Rank Tensor Completion via Adaptive ADMM

We consider a novel algorithm, for the completion of partially observed low-rank tensors, as a generalization of matrix completion. The proposed low-rank tensor completion (TC) method builds on the conventional nuclear norm (NN)…

机器学习 · 统计学 2026-05-06 Niclas Führling , Getuar Rexhepi , Giuseppe Thadeu Freitas de Abreu

Predicting missing values: A good idea?

Minimizing the Mean Squared Error (MSE) is a key objective in machine learning and is commonly used for imputing missing values. While this approach provides accurate point estimates, it introduces systematic biases in downstream analyses.…

机器学习 · 统计学 2026-05-06 Stef van Buuren

Tempered Guided Diffusion

Training-free conditional diffusion provides a flexible alternative to task-specific conditional model training, but existing samplers often allocate computation inefficiently: independent guided trajectories can vary widely in quality, and…

机器学习 · 统计学 2026-05-06 Andreas Makris , Paul Fearnhead , Chris Nemeth