机器学习 — Scifaro

Online Statistical Inference in Decision-Making with Matrix Context

The study of online decision-making problems that leverage contextual information has drawn notable attention due to their significant applications in fields ranging from healthcare to autonomous systems. In modern applications, contextual…

机器学习 · 统计学 2025-04-22 Qiyu Han , Will Wei Sun , Yichen Zhang

Composite Goodness-of-fit Tests with Kernels

Model misspecification can create significant challenges for the implementation of probabilistic models, and this has led to development of a range of robust methods which directly account for this issue. However, whether these more…

机器学习 · 统计学 2025-04-22 Oscar Key , Arthur Gretton , François-Xavier Briol , Tamara Fernandez

Near-optimal algorithms for private estimation and sequential testing of collision probability

We present new algorithms for estimating and testing \emph{collision probability}, a fundamental measure of the spread of a discrete distribution that is widely used in many scientific fields. We describe an algorithm that satisfies…

机器学习 · 统计学 2025-04-21 Robert Busa-Fekete , Umar Syed

On the Convergence of Irregular Sampling in Reproducing Kernel Hilbert Spaces

We analyse the convergence of sampling algorithms for functions in reproducing kernel Hilbert spaces (RKHS). To this end, we discuss approximation properties of kernel regression under minimalistic assumptions on both the kernel and the…

机器学习 · 统计学 2025-04-21 Armin Iske

Minimax Optimal Convergence of Gradient Descent in Logistic Regression via Large and Adaptive Stepsizes

We study $\textit{gradient descent}$ (GD) for logistic regression on linearly separable data with stepsizes that adapt to the current risk, scaled by a constant hyperparameter $\eta$. We show that after at most $1/\gamma^2$ burn-in steps,…

机器学习 · 统计学 2025-04-21 Ruiqi Zhang , Jingfeng Wu , Licong Lin , Peter L. Bartlett

Statistical Inference in Reinforcement Learning: A Selective Survey

Reinforcement learning (RL) is concerned with how intelligence agents take actions in a given environment to maximize the cumulative reward they receive. In healthcare, applying RL algorithms could assist patients in improving their health…

机器学习 · 统计学 2025-04-21 Chengchun Shi

Conformal Prediction Regions are Imprecise Highest Density Regions

Recently, Cella and Martin proved how, under an assumption called consonance, a credal set (i.e. a closed and convex set of probabilities) can be derived from the conformal transducer associated with transductive conformal prediction. We…

机器学习 · 统计学 2025-04-21 Michele Caprio , Yusuf Sale , Eyke Hüllermeier

Optimal Transport for $\epsilon$-Contaminated Credal Sets: To the Memory of Sayan Mukherjee

We present generalized versions of Monge's and Kantorovich's optimal transport problems with the probabilities being transported replaced by lower probabilities. We show that, when the lower probabilities are the lower envelopes of…

机器学习 · 统计学 2025-04-21 Michele Caprio

Symmetry-Based Structured Matrices for Efficient Approximately Equivariant Networks

There has been much recent interest in designing neural networks (NNs) with relaxed equivariance, which interpolate between exact equivariance and full flexibility for consistent performance gains. In a separate line of work, structured…

机器学习 · 统计学 2025-04-21 Ashwin Samudre , Mircea Petrache , Brian D. Nord , Shubhendu Trivedi

Deep Huber quantile regression networks

Typical machine learning regression applications aim to report the mean or the median of the predictive probability distribution, via training with a squared or an absolute error scoring function. The importance of issuing predictions of…

机器学习 · 统计学 2025-04-21 Hristos Tyralis , Georgia Papacharalampous , Nilay Dogulu , Kwok P. Chun

When do Random Forests work?

We study the effectiveness of randomizing split-directions in random forests. Prior literature has shown that, on the one hand, randomization can reduce variance through decorrelation, and, on the other hand, randomization regularizes and…

机器学习 · 统计学 2025-04-18 C. Revelas , O. Boldea , B. J. M. Werker

Robust and Scalable Variational Bayes

We propose a robust and scalable framework for variational Bayes (VB) that effectively handles outliers and contamination of arbitrary nature in large datasets. Our approach divides the dataset into disjoint subsets, computes the posterior…

机器学习 · 统计学 2025-04-18 Carlos Misael Madrid Padilla , Shitao Fan , Lizhen Lin

Applications of Statistical Field Theory in Deep Learning

Deep learning algorithms have made incredible strides in the past decade, yet due to their complexity, the science of deep learning remains in its early stages. Being an experimentally driven field, it is natural to seek a theory of deep…

机器学习 · 统计学 2025-04-18 Zohar Ringel , Noa Rubin , Edo Mor , Moritz Helias , Inbar Seroussi

Sequential Kernelized Stein Discrepancy

We present a sequential version of the kernelized Stein discrepancy goodness-of-fit test, which allows for conducting goodness-of-fit tests for unnormalized densities that are continuously monitored and adaptively stopped. That is, the…

机器学习 · 统计学 2025-04-18 Diego Martinez-Taboada , Aaditya Ramdas

ScoreFusion: Fusing Score-based Generative Models via Kullback-Leibler Barycenters

We introduce ScoreFusion, a theoretically grounded method for fusing multiple pre-trained diffusion models that are assumed to generate from auxiliary populations. ScoreFusion is particularly useful for enhancing the generative modeling of…

机器学习 · 统计学 2025-04-18 Hao Liu , Junze Tony Ye , Jose Blanchet , Nian Si

Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings

Machine learning methods often assume that input features are available at no cost. However, in domains like healthcare, where acquiring features could be expensive or harmful, it is necessary to balance a feature's acquisition cost against…

机器学习 · 统计学 2025-04-18 Henrik von Kleist , Alireza Zamanian , Ilya Shpitser , Narges Ahmidi

Towards the Theory of Unsupervised Federated Learning: Non-asymptotic Analysis of Federated EM Algorithms

While supervised federated learning approaches have enjoyed significant success, the domain of unsupervised federated learning remains relatively underexplored. Several federated EM algorithms have gained popularity in practice, however,…

机器学习 · 统计学 2025-04-18 Ye Tian , Haolei Weng , Yang Feng

Neyman-Pearson Multi-class Classification via Cost-sensitive Learning

Most existing classification methods aim to minimize the overall misclassification error rate. However, in applications such as loan default prediction, different types of errors can have varying consequences. To address this asymmetry…

机器学习 · 统计学 2025-04-18 Ye Tian , Yang Feng

Leave-One-Out Stable Conformal Prediction

Conformal prediction (CP) is an important tool for distribution-free predictive uncertainty quantification. Yet, a major challenge is to balance computational efficiency and prediction accuracy, particularly for multiple predictions. We…

机器学习 · 统计学 2025-04-17 Kiljae Lee , Yuan Zhang

Approximation Bounds for Transformer Networks with Application to Regression

We explore the approximation capabilities of Transformer networks for H\"older and Sobolev functions, and apply these results to address nonparametric regression estimation with dependent observations. First, we establish novel upper bounds…

机器学习 · 统计学 2025-04-17 Yuling Jiao , Yanming Lai , Defeng Sun , Yang Wang , Bokai Yan