机器学习 — Scifaro

Scalable Posterior Uncertainty for Flexible Density-Based Clustering

We introduce a novel framework for uncertainty quantification in clustering that combines martingale posterior distributions with density-based clustering. Unlike classical model-based approaches, which define clusters at the latent level…

机器学习 · 统计学 2026-04-20 Nicola Bariletto , Stephen G. Walker

Robustness Verification of Polynomial Neural Networks

We study robustness verification of neural networks via metric algebraic geometry. For polynomial neural networks, certifying a robustness radius amounts to computing the distance to the algebraic decision boundary. We use the Euclidean…

机器学习 · 统计学 2026-04-20 Yulia Alexandr , Hao Duan , Guido Montúfar

Sequential Regression Learning with Randomized Algorithms

This paper presents ``randomized SINDy", a sequential machine learning algorithm designed for dynamic data that has a time-dependent structure. It employs a probabilistic approach, with its PAC learning property rigorously proven through…

机器学习 · 统计学 2026-04-20 Dorival Leão , Reiko Aoki , Alberto Ohashi , Teh Led Red

Two-Dimensional Deep ReLU CNN Approximation for Korobov Functions: A Constructive Approach

This paper investigates approximation capabilities of two-dimensional (2D) deep convolutional neural networks (CNNs), with Korobov functions serving as a benchmark. We focus on 2D CNNs, comprising multi-channel convolutional layers with…

机器学习 · 统计学 2026-04-20 Qin Fang , Lei Shi , Min Xu , Ding-Xuan Zhou

Structural interpretability in SVMs with truncated orthogonal polynomial kernels

We study post-training interpretability for Support Vector Machines (SVMs) built from truncated orthogonal polynomial kernels. Since the associated reproducing kernel Hilbert space is finite-dimensional and admits an explicit tensor-product…

机器学习 · 统计学 2026-04-17 Víctor Soto-Larrosa , Nuria Torrado , Edmundo J. Huertas

Amortized Optimal Transport from Sliced Potentials

We propose a novel amortized optimization method for predicting optimal transport (OT) plans across multiple pairs of measures by leveraging Kantorovich potentials derived from sliced OT. We introduce two amortization strategies:…

机器学习 · 统计学 2026-04-17 Minh-Phuc Truong , Khai Nguyen

MinShap: A Modified Shapley Value Approach for Feature Selection

Feature selection is a classical problem in statistics and machine learning, and it continues to remain an extremely challenging problem especially in the context of unknown non-linear relationships with dependent features. On the other…

机器学习 · 统计学 2026-04-17 Chenghui Zheng , Garvesh Raskutti

Best of both worlds: Stochastic & adversarial best-arm identification

We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the…

机器学习 · 统计学 2026-04-17 Yasin Abbasi-Yadkori , Peter L. Bartlett , Victor Gabillon , Alan Malek , Michal Valko

Scalable Model-Based Clustering with Sequential Monte Carlo

In online clustering problems, there is often a large amount of uncertainty over possible cluster assignments that cannot be resolved until more data are observed. This difficulty is compounded when clusters follow complex distributions, as…

机器学习 · 统计学 2026-04-17 Connie Trojan , Pavel Myshkov , Paul Fearnhead , James Hensman , Tom Minka , Christopher Nemeth

Expert-Guided Class-Conditional Goodness-of-Fit Scores for Interpretable Classification with Informative Missingness: An Application to Seismic Monitoring

We study a classification problem with three key challenges: pervasive informative missingness, the integration of partial prior expert knowledge into the learning process, and the need for interpretable decision rules. We propose a…

机器学习 · 统计学 2026-04-17 Shahar Cohen , David M. Steinberg , Yael Radzyner , Yochai Ben Horin

Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models

Modern Machine Learning (ML) and Deep Neural Networks (DNNs) often operate on high-dimensional data and rely on overparameterized models, where classical low-dimensional intuitions break down. In particular, the proportional regime where…

机器学习 · 统计学 2026-04-17 Zhenyu Liao , Michael W. Mahoney

Covariance-adapting algorithm for semi-bandits with application to sparse rewards

We investigate stochastic combinatorial semi-bandits, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific…

机器学习 · 统计学 2026-04-16 Pierre Perrault , Vianney Perchet , Michal Valko

Robust Low-Rank Tensor Completion based on M-product with Weighted Correlated Total Variation and Sparse Regularization

The robust low-rank tensor completion problem addresses the challenge of recovering corrupted high-dimensional tensor data with missing entries, outliers, and sparse noise commonly found in real-world applications. Existing methodologies…

机器学习 · 统计学 2026-04-16 Biswarup Karmakar , Ratikanta Behera

Joint Representation Learning and Clustering via Gradient-Based Manifold Optimization

Clustering and dimensionality reduction have been crucial topics in machine learning and computer vision. Clustering high-dimensional data has been challenging for a long time due to the curse of dimensionality. For that reason, a more…

机器学习 · 统计学 2026-04-16 Sida Liu , Yangzi Guo , Mingyuan Wang

Identifiability of Potentially Degenerate Gaussian Mixture Models With Piecewise Affine Mixing

Causal representation learning (CRL) aims to identify the underlying latent variables from high-dimensional observations, even when variables are dependent with each other. We study this problem for latent variables that follow a…

机器学习 · 统计学 2026-04-16 Danru Xu , Sébastien Lachapelle , Sara Magliacane

Rare Event Analysis via Stochastic Optimal Control

Rare events such as conformational changes in biomolecules, phase transitions, and chemical reactions are central to the behavior of many physical systems, yet they are extremely difficult to study computationally because unbiased…

机器学习 · 统计学 2026-04-16 Yuanqi Du , Jiajun He , Dinghuai Zhang , Eric Vanden-Eijnden , Carles Domingo-Enrich

Cost-optimal Sequential Testing via Doubly Robust Q-learning

Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning…

机器学习 · 统计学 2026-04-16 Doudou Zhou , Yiran Zhang , Dian Jin , Yingye Zheng , Lu Tian , Tianxi Cai

Mini-Batch Covariance, Diffusion Limits, and Oracle Complexity in Stochastic Gradient Descent: A Sampling-Design Perspective

Stochastic gradient descent (SGD) is central to simulation optimization, stochastic programming, and online M-estimation, where sampling effort is a decision variable. We study the mini-batch gradient noise as a sampling-design object.…

机器学习 · 统计学 2026-04-16 Daniel Zantedeschi , Kumar Muthuraman

Random Walk Learning and the Pac-Man Attack

Random walk (RW)-based algorithms have long been popular in distributed systems due to low overheads and scalability, with recent growing applications in decentralized learning. However, their reliance on local interactions makes them…

机器学习 · 统计学 2026-04-16 Xingran Chen , Parimal Parag , Rohit Bhagat , Zonghong Liu , Salim El Rouayheb

Flow-based Generative Modeling of Potential Outcomes and Counterfactuals

Predicting potential and counterfactual outcomes from observational data is central to individualized decision-making, particularly in clinical settings where treatment choices must be tailored to each patient rather than guided solely by…

机器学习 · 统计学 2026-04-16 Dongze Wu , David I. Inouye , Yao Xie