机器学习 — Scifaro

A Spectral-Grassmann Wasserstein metric for operator representations of dynamical systems

The geometry of dynamical systems estimated from trajectory data is a major challenge for machine learning applications. Koopman and transfer operators provide a linear representation of nonlinear dynamics through their spectral…

机器学习 · 统计学 2025-09-30 Thibaut Germain , Rémi Flamary , Vladimir R. Kostic , Karim Lounici

MAD: Manifold Attracted Diffusion

Score-based diffusion models are a highly effective method for generating samples from a distribution of images. We consider scenarios where the training data comes from a noisy version of the target distribution, and present an efficiently…

机器学习 · 统计学 2025-09-30 Dennis Elbrächter , Giovanni S. Alberti , Matteo Santacesaria

PEARL: Performance-Enhanced Aggregated Representation Learning

Representation learning is a key technique in modern machine learning that enables models to identify meaningful patterns in complex data. However, different methods tend to extract distinct aspects of the data, and relying on a single…

机器学习 · 统计学 2025-09-30 Wenhui Li , Shijin Gong , Xinyu Zhang

ActiveCQ: Active Estimation of Causal Quantities

Estimating causal quantities (CQs) typically requires large datasets, which can be expensive to obtain, especially when measuring individual outcomes is costly. This challenge highlights the importance of sample-efficient active learning…

机器学习 · 统计学 2025-09-30 Erdun Gao , Dino Sejdinovic

End-to-End Deep Learning for Predicting Metric Space-Valued Outputs

Many modern applications involve predicting structured, non-Euclidean outputs such as probability distributions, networks, and symmetric positive-definite matrices. These outputs are naturally modeled as elements of general metric spaces,…

机器学习 · 统计学 2025-09-30 Yidong Zhou , Su I Iao , Hans-Georg Müller

A Generative Model for Controllable Feature Heterophily in Graphs

We introduce a principled generative framework for graph signals that enables explicit control of feature heterophily, a key property underlying the effectiveness of graph learning methods. Our model combines a Lipschitz graphon-based…

机器学习 · 统计学 2025-09-30 Haoyu Wang , Renyuan Ma , Gonzalo Mateos , Luana Ruiz

Conditional Risk Minimization with Side Information: A Tractable, Universal Optimal Transport Framework

Conditional risk minimization arises in high-stakes decisions where risk must be assessed in light of side information, such as stressed economic conditions, specific customer profiles, or other contextual covariates. Constructing reliable…

机器学习 · 统计学 2025-09-30 Xinqiao Xie , Jonathan Yu-Meng Li

Statistical Inference for Gradient Boosting Regression

Gradient boosting is widely popular due to its flexibility and predictive accuracy. However, statistical inference and uncertainty quantification for gradient boosting remain challenging and under-explored. We propose a unified framework…

机器学习 · 统计学 2025-09-30 Haimo Fang , Kevin Tan , Giles Hooker

Unsupervised Conformal Inference: Bootstrapping and Alignment to Control LLM Uncertainty

Deploying black-box LLMs requires managing uncertainty in the absence of token-level probability or true labels. We propose introducing an unsupervised conformal inference framework for generation, which integrates: generative models,…

机器学习 · 统计学 2025-09-30 Lingyou Pang , Lei Huang , Jianyu Lin , Tianyu Wang , Akira Horiguchi , Alexander Aue , Carey E. Priebe

Localized Uncertainty Quantification in Random Forests via Proximities

In machine learning, uncertainty quantification helps assess the reliability of model predictions, which is important in high-stakes scenarios. Traditional approaches often emphasize predictive accuracy, but there is a growing focus on…

机器学习 · 统计学 2025-09-30 Jake S. Rhodes , Scott D. Brown , J. Riley Wilkinson

Label-Guided Imputation via Forest-Based Proximities for Improved Time Series Classification

Missing data is a common problem in time series data. Most methods for imputation ignore label information pertaining to the time series even if that information exists. In this paper, we provide a framework for missing data imputation in…

机器学习 · 统计学 2025-09-30 Jake S. Rhodes , Adam G. Rustad , Sofia Pelagalli Maia , Evan Thacker , Hyunmi Choi , Jose Gutierrez , Tatjana Rundek , Ben Shaw

A theoretical guarantee for SyncRank

We present a theoretical and empirical analysis of the SyncRank algorithm for recovering a global ranking from noisy pairwise comparisons. By adopting a complex-valued data model where the true ranking is encoded in the phases of a…

机器学习 · 统计学 2025-09-30 Yang Rao

Identifying Memory Effects in Epidemics via a Fractional SEIRD Model and Physics-Informed Neural Networks

We develop a physics-informed neural network (PINN) framework for parameter estimation in fractional-order SEIRD epidemic models. By embedding the Caputo fractional derivative into the network residuals via the L1 discretization scheme, our…

机器学习 · 统计学 2025-09-30 Achraf Zinihi

Diffusion Generative Models Meet Compressed Sensing, with Applications to Imaging and Finance

In this study we develop dimension-reduction techniques to accelerate diffusion model inference in the context of synthetic data generation. The idea is to integrate compressed sensing into diffusion models (hence, CSDM): First, compress…

机器学习 · 统计学 2025-09-30 Zhengyi Guo , Jiatu Li , Wenpin Tang , David D. Yao

Flow Matching for Efficient and Scalable Data Assimilation

Data assimilation (DA) estimates a dynamical system's state from noisy observations. Recent generative models like the ensemble score filter (EnSF) improve DA in high-dimensional nonlinear settings but are computationally expensive. We…

机器学习 · 统计学 2025-09-30 Taos Transue , Bohan Chen , So Takao , Bao Wang

Flexible and Efficient Drift Detection without Labels

Machine learning models are being increasingly used to automate decisions in almost every domain, and ensuring the performance of these models is crucial for ensuring high quality machine learning enabled services. Ensuring concept drift is…

机器学习 · 统计学 2025-09-30 Nelvin Tan , Yu-Ching Shih , Dong Yang , Amol Salunkhe

Sobolev norm inconsistency of kernel interpolation

We study the consistency of minimum-norm interpolation in reproducing kernel Hilbert spaces corresponding to bounded kernels. Our main result give lower bounds for the generalization error of the kernel interpolation measured in a…

机器学习 · 统计学 2025-09-30 Yunfei Yang

Conformal prediction of future insurance claims in the regression problem

In the current insurance literature, prediction of insurance claims in the regression problem is often performed with a statistical model. This model-based approach may potentially suffer from several drawbacks: (i) model misspecification,…

机器学习 · 统计学 2025-09-30 Liang Hong

Gaussian Universality for Diffusion Models

We investigate Gaussian Universality for data distributions generated via diffusion models. By Gaussian Universality we mean that the test error of a generalized linear model $f(\mathbf{W})$ trained for a classification task on the…

机器学习 · 统计学 2025-09-30 Reza Ghane , Anthony Bao , Danil Akhtiamov , Babak Hassibi

Gaussian Process Priors for Boundary Value Problems of Linear Partial Differential Equations

Working with systems of partial differential equations (PDEs) is a fundamental task in computational science. Well-posed systems are addressed by numerical solvers or neural operators, whereas systems described by data are often addressed…

机器学习 · 统计学 2025-09-30 Jianlei Huang , Marc Härkönen , Markus Lange-Hegermann , Bogdan Raiţă