机器学习 — Scifaro

PAC Learning with Improvements

One of the most basic lower bounds in machine learning is that in nearly any nontrivial setting, it takes $\textit{at least}$ $1/\epsilon$ samples to learn to error $\epsilon$ (and more, if the classifier being learned is complex). However,…

机器学习 · 统计学 2025-06-04 Idan Attias , Avrim Blum , Keziah Naggita , Donya Saless , Dravyansh Sharma , Matthew Walter

A Hessian-Aware Stochastic Differential Equation for Modelling SGD

Continuous-time approximation of Stochastic Gradient Descent (SGD) is a crucial tool to study its escaping behaviors from stationary points. However, existing stochastic differential equation (SDE) models fail to fully capture these…

机器学习 · 统计学 2025-06-04 Xiang Li , Zebang Shen , Liang Zhang , Niao He

Spectral Clustering for Directed Graphs via Likelihood Estimation on Stochastic Block Models

Graph clustering is a fundamental task in unsupervised learning with broad real-world applications. While spectral clustering methods for undirected graphs are well-established and guided by a minimum cut optimization consensus, their…

机器学习 · 统计学 2025-06-04 Ning Zhang , Xiaowen Dong , Mihai Cucuringu

Fast and Multiphase Rates for Nearest Neighbor Classifiers

We study the scaling of classification error rates with respect to the size of the training dataset. In contrast to classical results where rates are minimax optimal for a problem class, this work starts with the empirical observation that,…

机器学习 · 统计学 2025-06-04 Pengkun Yang , Jingzhao Zhang

Machine-Learned Sampling of Conditioned Path Measures

We propose algorithms for sampling from posterior path measures $P(C([0, T], \mathbb{R}^d))$ under a general prior process. This leverages ideas from (1) controlled equilibrium dynamics, which gradually transport between two path measures,…

机器学习 · 统计学 2025-06-03 Qijia Jiang , Reuben Cohn-Gordon

Signature Maximum Mean Discrepancy Two-Sample Statistical Tests

Maximum Mean Discrepancy (MMD) is a widely used concept in machine learning research which has gained popularity in recent years as a highly effective tool for comparing (finite-dimensional) distributions. Since it is designed as a…

机器学习 · 统计学 2025-06-03 Andrew Alden , Blanka Horvath , Zacharia Issa

Adversarial learning for nonparametric regression: Minimax rate and adaptive estimation

Despite tremendous advancements of machine learning models and algorithms in various application domains, they are known to be vulnerable to subtle, natural or intentionally crafted perturbations in future input data, known as adversarial…

机器学习 · 统计学 2025-06-03 Jingfu Peng , Yuhong Yang

Projection Pursuit Density Ratio Estimation

Density ratio estimation (DRE) is a paramount task in machine learning, for its broad applications across multiple domains, such as covariate shift adaptation, causal inference, independence tests and beyond. Parametric methods for…

机器学习 · 统计学 2025-06-03 Meilin Wang , Wei Huang , Mingming Gong , Zheng Zhang

Generalized Linear Markov Decision Process

The linear Markov Decision Process (MDP) framework offers a principled foundation for reinforcement learning (RL) with strong theoretical guarantees and sample efficiency. However, its restrictive assumption-that both transition dynamics…

机器学习 · 统计学 2025-06-03 Sinian Zhang , Kaicheng Zhang , Ziping Xu , Tianxi Cai , Doudou Zhou

Score Matching With Missing Data

Score matching is a vital tool for learning the distribution of data with applications across many areas including diffusion processes, energy based modelling, and graphical model estimation. Despite all these applications, little work…

机器学习 · 统计学 2025-06-03 Josh Givens , Song Liu , Henry W J Reeve

Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling

Off-policy evaluation (OPE) in ranking settings with large ranking action spaces, which stems from an increase in both the number of unique actions and length of the ranking, is essential for assessing new recommender policies using only…

机器学习 · 统计学 2025-06-03 Tatsuki Takahashi , Chihiro Maru , Hiroko Shoji

Label-shift robust federated feature screening for high-dimensional classification

Distributed and federated learning are important tools for high-dimensional classification of large datasets. To reduce computational costs and overcome the curse of dimensionality, feature screening plays a pivotal role in eliminating…

机器学习 · 统计学 2025-06-03 Qi Qin , Erbo Li , Xingxiang Li , Yifan Sun , Wu Wang , Chen Xu

Beyond Winning: Margin of Victory Relative to Expectation Unlocks Accurate Skill Ratings

Knowledge of accurate relative skills in any competitive system is essential, but foundational approaches such as ELO discard extremely relevant performance data by concentrating exclusively on binary outcomes. While margin of victory (MOV)…

机器学习 · 统计学 2025-06-03 Shivam Shorewala , Zihao Yang

Bayesian Data Sketching for Varying Coefficient Regression Models

Varying coefficient models are popular for estimating nonlinear regression functions in functional data models. Their Bayesian variants have received limited attention in large data applications, primarily due to prohibitively slow…

机器学习 · 统计学 2025-06-03 Rajarshi Guhaniyogi , Laura Baracaldo , Sudipto Banerjee

Riemannian Principal Component Analysis

This paper proposes an innovative extension of Principal Component Analysis (PCA) that transcends the traditional assumption of data lying in Euclidean space, enabling its application to data on Riemannian manifolds. The primary challenge…

机器学习 · 统计学 2025-06-03 Oldemar Rodríguez

Minimax Rates for the Estimation of Eigenpairs of Weighted Laplace-Beltrami Operators on Manifolds

We study the problem of estimating eigenpairs of elliptic differential operators from samples of a distribution $\rho$ supported on a manifold $M$. The operators discussed in the paper are relevant in unsupervised learning and in particular…

机器学习 · 统计学 2025-06-03 Nicolás García Trillos , Chenghui Li , Raghavendra Venkatraman

New Lower Bounds for Stochastic Non-Convex Optimization through Divergence Decomposition

We study fundamental limits of first-order stochastic optimization in a range of nonconvex settings, including L-smooth functions satisfying Quasar-Convexity (QC), Quadratic Growth (QG), and Restricted Secant Inequalities (RSI). While the…

机器学习 · 统计学 2025-06-03 El Mehdi Saad , Wei-Cheng Lee , Francesco Orabona

Generalized Bayesian deep reinforcement learning

Bayesian reinforcement learning (BRL) is a method that merges principles from Bayesian statistics and reinforcement learning to make optimal decisions in uncertain environments. As a model-based RL method, it has two key components: (1)…

机器学习 · 统计学 2025-06-03 Shreya Sinha Roy , Richard G. Everitt , Christian P. Robert , Ritabrata Dutta

Self-supervised contrastive learning performs non-linear system identification

Self-supervised learning (SSL) approaches have brought tremendous success across many tasks and domains. It has been argued that these successes can be attributed to a link between SSL and identifiable representation learning: Temporal…

机器学习 · 统计学 2025-06-03 Rodrigo González Laiz , Tobias Schmidt , Steffen Schneider

Understanding the Statistical Accuracy-Communication Trade-off in Personalized Federated Learning with Minimax Guarantees

Personalized federated learning (PFL) offers a flexible framework for aggregating information across distributed clients with heterogeneous data. This work considers a personalized federated learning setting that simultaneously learns…

机器学习 · 统计学 2025-06-03 Xin Yu , Zelin He , Ying Sun , Lingzhou Xue , Runze Li