机器学习 — Scifaro

Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula

The analytic characterization of the high-dimensional behavior of optimization for Generalized Linear Models (GLMs) with Gaussian data has been a central focus in statistics and probability in recent years. While convex cases, such as the…

机器学习 · 统计学 2026-01-13 Matteo Vilucchio , Yatin Dandi , Matéo Pirio Rossignol , Cedric Gerbelot , Florent Krzakala

Point processes with event time uncertainty

Point processes are widely used statistical models for continuous-time discrete event data, such as medical records, crime reports, and social network interactions, to capture the influence of historical events on future occurrences. In…

机器学习 · 统计学 2026-01-13 Xiuyuan Cheng , Tingnan Gong , Yao Xie

Hierarchic Flows to Estimate and Sample High-dimensional Probabilities

Finding low-dimensional interpretable models of complex physical fields such as turbulence remains an open question, 80 years after the pioneer work of Kolmogorov. Estimating high-dimensional probability distributions from data samples…

机器学习 · 统计学 2026-01-13 Etienne Lempereur , Stéphane Mallat

A High Dimensional Statistical Model for Adversarial Training: Geometry and Trade-Offs

This work investigates adversarial training in the context of margin-based linear classifiers in the high-dimensional regime where the dimension $d$ and the number of data points $n$ diverge with a fixed ratio $\alpha = n / d$. We introduce…

机器学习 · 统计学 2026-01-13 Kasimir Tanner , Matteo Vilucchio , Bruno Loureiro , Florent Krzakala

Learning Operators with Stochastic Gradient Descent in General Hilbert Spaces

This study investigates leveraging stochastic gradient descent (SGD) to learn operators between general Hilbert spaces. We propose weak and strong regularity conditions for the target operator to depict its intrinsic structure and…

机器学习 · 统计学 2026-01-13 Lei Shi , Jia-Qi Yang

A Convex Framework for Confounding Robust Inference

We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However,…

机器学习 · 统计学 2026-01-13 Kei Ishikawa , Niao He , Takafumi Kanamori

The Interpolating Information Criterion for Overparameterized Models

The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset. Classical information criteria typically consider the large-data limit,…

机器学习 · 统计学 2026-01-13 Liam Hodgkinson , Chris van der Heide , Robert Salomone , Fred Roosta , Michael W. Mahoney

Approximating Persistent Homology for Large Datasets

Persistent homology is an important methodology in topological data analysis which adapts theory from algebraic topology to data settings. Computing persistent homology produces persistence diagrams, which have been successfully used in…

机器学习 · 统计学 2026-01-13 Yueqi Cao , Anthea Monod

Accumulation of Sub-Sampling Matrices with Applications to Statistical Computation

With appropriately chosen sampling probabilities, sampling-based random projection can be used to implement large-scale statistical methods, substantially reducing computational cost while maintaining low statistical error. However,…

机器学习 · 统计学 2026-01-13 Yifan Chen , Yun Yang

Manifold limit for the training of shallow graph convolutional neural networks

We study the discrete-to-continuum consistency of the training of shallow graph convolutional neural networks (GCNNs) on proximity graphs of sampled point clouds under a manifold assumption. Graph convolution is defined spectrally via the…

机器学习 · 统计学 2026-01-12 Johanna Tengler , Christoph Brune , José A. Iglesias

Multi-task Modeling for Engineering Applications with Sparse Data

Modern engineering and scientific workflows often require simultaneous predictions across related tasks and fidelity levels, where high-fidelity data is scarce and expensive, while low-fidelity data is more abundant. This paper introduces…

机器学习 · 统计学 2026-01-12 Yigitcan Comlek , R. Murali Krishnan , Sandipp Krishnan Ravi , Amin Moghaddas , Rafael Giorjao , Michael Eff , Anirban Samaddar , Nesar S. Ramachandra , Sandeep Madireddy , Liping Wang

A brief note on learning problem with global perspectives

This brief note considers the problem of learning with dynamic-optimizing principal-agent setting, in which the agents are allowed to have global perspectives about the learning process, i.e., the ability to view things according to their…

机器学习 · 统计学 2026-01-12 Getachew K. Befekadu

Machine learning assisted state prediction of misspecified linear dynamical system via modal reduction

Accurate prediction of structural dynamics is imperative for preserving digital twin fidelity throughout operational lifetimes. Parametric models with fixed nominal parameters often omit critical physical effects due to simplifications in…

机器学习 · 统计学 2026-01-12 Rohan Vitthal Thorat , Rajdip Nayek

Next-Generation Reservoir Computing for Dynamical Inference

We present a simple and scalable implementation of next-generation reservoir computing (NGRC) for modeling dynamical systems from time-series data. The method uses a pseudorandom nonlinear projection of time-delay embedded inputs, allowing…

机器学习 · 统计学 2026-01-12 Rok Cestnik , Erik A. Martens

Stochastic Deep Learning: A Probabilistic Framework for Modeling Uncertainty in Structured Temporal Data

I propose a novel framework that integrates stochastic differential equations (SDEs) with deep generative models to improve uncertainty quantification in machine learning applications involving structured and temporal data. This approach,…

机器学习 · 统计学 2026-01-09 James Rice

Avoiding the Price of Adaptivity: Inference in Linear Contextual Bandits via Stability

Statistical inference in contextual bandits is challenging due to the adaptive, non-i.i.d. nature of the data. A growing body of work shows that classical least-squares inference can fail under adaptive sampling, and that valid confidence…

机器学习 · 统计学 2026-01-09 Samya Praharaj , Koulik Khamaru

High-Dimensional Change Point Detection using Graph Spanning Ratio

Inspired by graph-based methodologies, we introduce a novel graph-spanning algorithm designed to identify changes in both offline and online data across low to high dimensions. This versatile approach is applicable to Euclidean and…

机器学习 · 统计学 2026-01-09 Yang-Wen Sun , Katerina Papagiannouli , Vladimir Spokoiny

Structured Matching via Cost-Regularized Unbalanced Optimal Transport

Unbalanced optimal transport (UOT) provides a flexible way to match or compare nonnegative finite Radon measures. However, UOT requires a predefined ground transport cost, which may misrepresent the data's underlying geometry. Choosing such…

机器学习 · 统计学 2026-01-09 Emanuele Pardini , Katerina Papagiannouli

Centroid Decision Forest

This paper introduces the centroid decision forest (CDF), a novel ensemble learning framework that redefines the splitting strategy and tree building in the ordinary decision trees for high-dimensional classification. The splitting approach…

机器学习 · 统计学 2026-01-09 Amjad Ali , Saeed Aldahmani , Hailiang Du , Zardad Khan

A Theoretical and Empirical Taxonomy of Imbalance in Binary Classification

Class imbalance significantly degrades classification performance, yet its effects are rarely analyzed from a unified theoretical perspective. We propose a principled framework based on three fundamental scales: the imbalance coefficient…

机器学习 · 统计学 2026-01-08 Rose Yvette Bandolo Essomba , Ernest Fokoué