机器学习 — Scifaro

Data-Driven Model Reduction using WeldNet: Windowed Encoders for Learning Dynamics

Many problems in science and engineering involve time-dependent, high dimensional datasets arising from complex physical processes, which are costly to simulate. In this work, we propose WeldNet: Windowed Encoders for Learning Dynamics, a…

机器学习 · 统计学 2025-12-15 Biraj Dahal , Jiahui Cheng , Hao Liu , Rongjie Lai , Wenjing Liao

Provable Recovery of Locally Important Signed Features and Interactions from Random Forest

Feature and Interaction Importance (FII) methods are essential in supervised learning for assessing the relevance of input variables and their interactions in complex prediction models. In many domains, such as personalized medicine, local…

机器学习 · 统计学 2025-12-15 Kata Vuk , Nicolas Alexander Ihlo , Merle Behr

An Efficient Variant of One-Class SVM with Lifelong Online Learning Guarantees

We study outlier (a.k.a., anomaly) detection for single-pass non-stationary streaming data. In the well-studied offline or batch outlier detection problem, traditional methods such as kernel One-Class SVM (OCSVM) are both computationally…

机器学习 · 统计学 2025-12-15 Joe Suk , Samory Kpotufe

STARK denoises spatial transcriptomics images via adaptive regularization

We present an approach to denoising spatial transcriptomics images that is particularly effective for uncovering cell identities in the regime of ultra-low sequencing depths, and also allows for interpolation of gene expression. The method…

机器学习 · 统计学 2025-12-15 Sharvaj Kubal , Naomi Graham , Matthieu Heitz , Andrew Warren , Michael P. Friedlander , Yaniv Plan , Geoffrey Schiebinger

Statistical physics of deep learning: Optimal learning of a multi-layer perceptron near interpolation

For four decades statistical physics has been providing a framework to analyse neural networks. A long-standing question remained on its capacity to tackle deep learning models capturing rich feature learning effects, thus going beyond the…

机器学习 · 统计学 2025-12-15 Jean Barbier , Francesco Camilli , Minh-Toan Nguyen , Mauro Pastore , Rudy Skerk

Statistical Inference for Differentially Private Stochastic Gradient Descent

Privacy preservation in machine learning, particularly through Differentially Private Stochastic Gradient Descent (DP-SGD), is critical for sensitive data analysis. However, existing statistical inference methods for SGD predominantly focus…

机器学习 · 统计学 2025-12-15 Xintao Xia , Linjun Zhang , Zhanrui Cai

HyperSBINN: A Hypernetwork-Enhanced Systems Biology-Informed Neural Network for Efficient Drug Cardiosafety Assessment

Mathematical modeling in systems toxicology enables a comprehensive understanding of the effects of pharmaceutical substances on cardiac health. However, the complexity of these models limits their widespread application in early drug…

机器学习 · 统计学 2025-12-15 Inass Soukarieh , Gerhard Hessler , Hervé Minoux , Marcel Mohr , Friedemann Schmidt , Jan Wenzel , Pierre Barbillon , Hugo Gangloff , Pierre Gloaguen

Physics-informed Polynomial Chaos Expansion with Enhanced Constrained Optimization Solver and D-optimal Sampling

Physics-informed polynomial chaos expansions (PC$^2$) provide an efficient physically constrained surrogate modeling framework by embedding governing equations and other physical constraints into the standard data-driven polynomial chaos…

机器学习 · 统计学 2025-12-12 Qitian Lu , Himanshu Sharma , Michael D. Shields , Lukáš Novák

Supervised Learning of Random Neural Architectures Structured by Latent Random Fields on Compact Boundaryless Multiply-Connected Manifolds

This paper introduces a new probabilistic framework for supervised learning in neural systems. It is designed to model complex, uncertain systems whose random outputs are strongly non-Gaussian given deterministic inputs. The architecture…

机器学习 · 统计学 2025-12-12 Christian Soize

Error Analysis of Generalized Langevin Equations with Approximated Memory Kernels

We analyze prediction error in stochastic dynamical systems with memory, focusing on generalized Langevin equations (GLEs) formulated as stochastic Volterra equations. We establish that, under a strongly convex potential, trajectory…

机器学习 · 统计学 2025-12-12 Quanjun Lang , Jianfeng Lu

The Interplay of Statistics and Noisy Optimization: Learning Linear Predictors with Random Data Weights

We analyze gradient descent with randomly weighted data points in a linear regression model, under a generic weighting distribution. This includes various forms of stochastic gradient descent, importance sampling, but also extends to…

机器学习 · 统计学 2025-12-12 Gabriel Clara , Yazan Mash'al

LxCIM: a new rank-based binary classifier performance metric invariant to local exchange of classes

Binary classification is one of the oldest, most prevalent, and studied problems in machine learning. However, the metrics used to evaluate model performance have received comparatively little attention. The area under the receiver…

机器学习 · 统计学 2025-12-12 Tiago Brogueira , Mário A. T. Figueiredo

Sublinear Variational Optimization of Gaussian Mixture Models with Millions to Billions of Parameters

Gaussian Mixture Models (GMMs) range among the most frequently used models in machine learning. However, training large, general GMMs becomes computationally prohibitive for datasets that have many data points $N$ of high-dimensionality…

机器学习 · 统计学 2025-12-12 Sebastian Salwig , Till Kahlke , Florian Hirschberger , Dennis Forster , Jörg Lücke

Beyond Log-Concavity and Score Regularity: Improved Convergence Bounds for Score-Based Generative Models in W2-distance

Score-based Generative Models (SGMs) aim to sample from a target distribution by learning score functions using samples perturbed by Gaussian noise. Existing convergence bounds for SGMs in the W2-distance rely on stringent assumptions about…

机器学习 · 统计学 2025-12-12 Marta Gentiloni-Silveri , Antonio Ocello

Supervised learning pays attention

In-context learning with attention enables large neural networks to make context-specific predictions by selectively focusing on relevant examples. Here, we adapt this idea to supervised learning procedures such as lasso regression and…

机器学习 · 统计学 2025-12-11 Erin Craig , Robert Tibshirani

Estimation of Stochastic Optimal Transport Maps

The optimal transport (OT) map is a geometry-driven transformation between high-dimensional probability distributions which underpins a wide range of tasks in statistics, applied probability, and machine learning. However, existing…

机器学习 · 统计学 2025-12-11 Sloan Nietert , Ziv Goldfeld

Robust and Sparse Estimation of Unbounded Density Ratio under Heavy Contamination

We examine the non-asymptotic properties of robust density ratio estimation (DRE) in contaminated settings. Weighted DRE is the most promising among existing methods, exhibiting doubly strong robustness from an asymptotic perspective. This…

机器学习 · 统计学 2025-12-11 Ryosuke Nagumo , Hironori Fujisawa

WTNN: Weibull-Tailored Neural Networks for survival analysis

The Weibull distribution is a commonly adopted choice for modeling the survival of systems subject to maintenance over time. When only proxy indicators and censored observations are available, it becomes necessary to express the…

机器学习 · 统计学 2025-12-11 Gabrielle Rives , Olivier Lopez , Nicolas Bousquet

Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming

We study online statistical inference for the solutions of stochastic optimization problems with equality and inequality constraints. Such problems are prevalent in statistics and machine learning, encompassing constrained $M$-estimation,…

机器学习 · 统计学 2025-12-11 Yihang Gao , Michael K. Ng , Michael W. Mahoney , Sen Na

Function-on-Function Bayesian Optimization

Bayesian optimization (BO) has been widely used to optimize expensive and gradient-free objective functions across various domains. However, existing BO methods have not addressed the objective where both inputs and outputs are functions,…

机器学习 · 统计学 2025-12-11 Jingru Huang , Haijie Xu , Manrui Jiang , Chen Zhang