机器学习 — Scifaro

Spectral-Transport Stability and Benign Overfitting in Interpolating Learning

We develop a theoretical framework for generalization in the interpolating regime of statistical learning. The central question is why highly overparameterized estimators can attain zero empirical risk while still achieving nontrivial…

机器学习 · 统计学 2026-04-13 Gustav Olaf Yunus Laitinen-Lundström Fredriksson-Imanov

Biconvex Biclustering

This article proposes a biconvex modification to convex biclustering in order to improve its performance in high-dimensional settings. In contrast to heuristics that discard a subset of noisy features a priori, our method jointly learns and…

机器学习 · 统计学 2026-04-13 Sam Rosen , Eric C. Chi , Jason Xu

Distribution-free two-sample testing with blurred total variation distance

Two-sample testing, where we aim to determine whether two distributions are equal or not equal based on samples from each one, is challenging if we cannot place assumptions on the properties of the two distributions. In particular,…

机器学习 · 统计学 2026-04-13 Rohan Hore , Rina Foygel Barber

Differentially Private and Federated Structure Learning in Bayesian Networks

Learning the structure of a Bayesian network from decentralized data poses two major challenges: (i) ensuring rigorous privacy guarantees for participants, and (ii) avoiding communication costs that scale poorly with dimensionality. In this…

机器学习 · 统计学 2026-04-13 Ghita Fassy El Fehri , Aurélien Bellet , Philippe Bastien

MIBoost: A gradient boosting algorithm for variable selection after multiple imputation

Statistical learning methods for automated variable selection, such as the Least Absolute Shrinkage and Selection Operator (LASSO), elastic nets, and gradient boosting, have become increasingly popular tools for building powerful prediction…

机器学习 · 统计学 2026-04-13 Robert Kuchen

GL-LowPopArt: A Nearly Instance-Wise Minimax-Optimal Estimator for Generalized Low-Rank Trace Regression

We present `GL-LowPopArt`, a novel Catoni-style estimator for generalized low-rank trace regression. Building on `LowPopArt` (Jang et al., 2024), it employs a two-stage approach: nuclear norm regularization followed by matrix Catoni…

机器学习 · 统计学 2026-04-13 Junghyun Lee , Kyoungseok Jang , Kwang-Sung Jun , Milan Vojnović , Se-Young Yun

Conformal Prediction in Hierarchical Classification with Constrained Representation Complexity

Conformal prediction has emerged as a widely used framework for constructing valid prediction sets in classification and regression tasks. In this work, we extend the split conformal prediction framework to hierarchical classification,…

机器学习 · 统计学 2026-04-13 Thomas Mortier , Alireza Javanmardi , Yusuf Sale , Eyke Hüllermeier , Willem Waegeman

Differentially Private Language Generation and Identification in the Limit

We initiate the study of language generation in the limit, a model recently introduced by Kleinberg and Mullainathan [KM24], under the constraint of differential privacy. We consider the continual release model, where a generator must…

机器学习 · 统计学 2026-04-10 Anay Mehrotra , Grigoris Velegkas , Xifan Yu , Felix Zhou

Intensity Dot Product Graphs

Latent-position random graph models usually treat the node set as fixed once the sample size is chosen, while graphon-based and random-measure constructions allow more randomness at the cost of weaker geometric interpretability. We…

机器学习 · 统计学 2026-04-10 Giulio Valentino Dalla Riva , Matteo Dalla Riva

Sparse $\epsilon$ insensitive zone bounded asymmetric elastic net support vector machines for pattern classification

Existing support vector machines(SVM) models are sensitive to noise and lack sparsity, which limits their performance. To address these issues, we combine the elastic net loss with a robust loss framework to construct a sparse…

机器学习 · 统计学 2026-04-10 Haiyan Du , Hu Yang

The Condition-Number Principle for Prototype Clustering

We develop a geometric framework that links objective accuracy to structural recovery in prototype-based clustering. The analysis is algorithm-agnostic and applies to a broad class of admissible loss functions. We define a clustering…

机器学习 · 统计学 2026-04-10 Romano Li , Jianfei Cao

On the Unique Recovery of Transport Maps and Vector Fields from Finite Measure-Valued Data

We establish guarantees for the unique recovery of vector fields and transport maps from finite measure-valued data, yielding new insights into generative models, data-driven dynamical systems, and PDE inverse problems. In particular, we…

机器学习 · 统计学 2026-04-10 Jonah Botvinick-Greenhouse , Yunan Yang

Variational Approximated Restricted Maximum Likelihood Estimation for Spatial Data

This research considers a scalable inference for spatial data modeled through Gaussian intrinsic conditional autoregressive (ICAR) structures. The classical estimation method, restricted maximum likelihood (REML), requires repeated…

机器学习 · 统计学 2026-04-10 Debjoy Thakur

NS-RGS: Newton-Schulz based Riemannian gradient method for orthogonal group synchronization

Group synchronization is a fundamental task involving the recovery of group elements from pairwise measurements. For orthogonal group synchronization, the most common approach reformulates the problem as a constrained nonconvex optimization…

机器学习 · 统计学 2026-04-10 Haiyang Peng , Deren Han , Xin Chen , Meng Huang

tBayes-MICE: A Bayesian Approach to Multiple Imputation for Time Series Data

Time-series analysis is often affected by missing data, a common problem across several fields, including healthcare and environmental monitoring. Multiple Imputation by Chained Equations (MICE) has been prominent for imputing missing…

机器学习 · 统计学 2026-04-10 Amuche Ibenegbu , Pierre Lafaye de Micheaux , Rohitash Chandra

Flow Matching is Adaptive to Manifold Structures

Flow matching has emerged as a simulation-free alternative to diffusion-based generative modeling, producing samples by solving an ODE whose time-dependent velocity field is learned along an interpolation between a simple source…

机器学习 · 统计学 2026-04-10 Shivam Kumar , Yixin Wang , Lizhen Lin

Evaluating Singular Value Thresholds for DNN Weight Matrices based on Random Matrix Theory

This study evaluates thresholds for removing singular values from singular value decomposition-based low-rank approximations of deep neural network weight matrices. Each weight matrix is modeled as the sum of signal and noise matrices. The…

机器学习 · 统计学 2026-04-10 Kohei Nishikawa , Koki Shimizu , Hiroki Hashiguchi

Physics-Informed Neural Networks for Joint Source and Parameter Estimation in Advection-Diffusion Equations

Recent studies have demonstrated the success of deep learning in solving forward and inverse problems in engineering and scientific computing domains, such as physics-informed neural networks (PINNs). Source inversion problems under sparse…

机器学习 · 统计学 2026-04-10 Brenda Anague , Bamdad Hosseini , Issa Karambal , Jean Medard Ngnotchouye

A Probabilistic Formulation of Offset Noise in Diffusion Models

Diffusion models have become fundamental tools for modeling data distributions in machine learning. Despite their success, these models face challenges when generating data with extreme brightness values, as evidenced by limitations…

机器学习 · 统计学 2026-04-10 Takuro Kutsuna

Gaussian Approximation for Asynchronous Q-learning

In this paper, we derive rates of convergence in the high-dimensional central limit theorem for Polyak-Ruppert averaged iterates generated by the asynchronous Q-learning algorithm with a polynomial stepsize $k^{-\omega},\, \omega \in (1/2,…

机器学习 · 统计学 2026-04-09 Artemy Rubtsov , Sergey Samsonov , Vladimir Ulyanov , Alexey Naumov