机器学习 — Scifaro

Error-quantified Conformal Inference for Time Series

Uncertainty quantification in time series prediction is challenging due to the temporal dependence and distribution shift on sequential data. Conformal inference provides a pivotal and flexible instrument for assessing the uncertainty of…

机器学习 · 统计学 2025-09-09 Junxi Wu , Dongjian Hu , Yajie Bao , Shu-Tao Xia , Changliang Zou

Sequential Controlled Langevin Diffusions

An effective approach for sampling from unnormalized densities is based on the idea of gradually transporting samples from an easy prior to the complicated target distribution. Two popular methods are (1) Sequential Monte Carlo (SMC), where…

机器学习 · 统计学 2025-09-09 Junhua Chen , Lorenz Richter , Julius Berner , Denis Blessing , Gerhard Neumann , Anima Anandkumar

Limit Theorems for Stochastic Gradient Descent with Infinite Variance

Stochastic gradient descent is a classic algorithm that has gained great popularity especially in the last decades as the most common approach for training models in machine learning. While the algorithm has been well-studied when…

机器学习 · 统计学 2025-09-09 Jose Blanchet , Aleksandar Mijatović , Wenhao Yang

Confirmation Bias in Gaussian Mixture Models

Confirmation bias, the tendency to interpret information in a way that aligns with one's preconceptions, can profoundly impact scientific research, leading to conclusions that reflect the researcher's hypotheses even when the observational…

机器学习 · 统计学 2025-09-09 Amnon Balanov , Tamir Bendory , Wasim Huleihel

Autoencoders in Function Space

Autoencoders have found widespread application in both their original deterministic form and in their variational formulation (VAEs). In scientific applications and in image processing it is often of interest to consider data that are…

机器学习 · 统计学 2025-09-09 Justin Bunker , Mark Girolami , Hefin Lambley , Andrew M. Stuart , T. J. Sullivan

Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution

We consider a variant of the stochastic gradient descent (SGD) with a random learning rate and reveal its convergence properties. SGD is a widely used stochastic optimization algorithm in machine learning, especially deep learning. Numerous…

机器学习 · 统计学 2025-09-09 Naoki Yoshida , Shogo Nakakita , Masaaki Imaizumi

Robust Generative Learning with Lipschitz-Regularized $\alpha$-Divergences Allows Minimal Assumptions on Target Distributions

This paper demonstrates the robustness of Lipschitz-regularized $\alpha$-divergences as objective functionals in generative modeling, showing they enable stable learning across a wide range of target distributions with minimal assumptions.…

机器学习 · 统计学 2025-09-09 Ziyu Chen , Hyemin Gu , Markos A. Katsoulakis , Luc Rey-Bellet , Wei Zhu

On Rate-Optimal Partitioning Classification from Observable and from Privatised Data

In this paper we revisit the classical method of partitioning classification and study its convergence rate under relaxed conditions, both for observable (non-privatised) and for privatised data. We consider the problem of classification in…

机器学习 · 统计学 2025-09-09 Balázs Csanád Csáji , László Györfi , Ambrus Tamás , Harro Walk

Spectral Algorithms in Misspecified Regression: Convergence under Covariate Shift

This paper investigates the convergence properties of spectral algorithms -- a class of regularization methods originating from inverse problems -- under covariate shift. In this setting, the marginal distributions of inputs differ between…

机器学习 · 统计学 2025-09-08 Ren-Rui Liu , Zheng-Chu Guo

Optimal Variance and Covariance Estimation under Differential Privacy in the Add-Remove Model and Beyond

In this paper, we study the problem of estimating the variance and covariance of datasets under differential privacy in the add-remove model. While estimation in the swap model has been extensively studied in the literature, the add-remove…

机器学习 · 统计学 2025-09-08 Shokichi Takakura , Seng Pei Liew , Satoshi Hasegawa

Test Set Sizing for the Ridge Regression

We derive the ideal train/test split for the ridge regression to high accuracy in the limit that the number of training rows m becomes large. The split must depend on the ridge tuning parameter, alpha, but we find that the dependence is…

机器学习 · 统计学 2025-09-08 Alexander Dubbs

The Broader Landscape of Robustness in Algorithmic Statistics

The last decade has seen a number of advances in computationally efficient algorithms for statistical methods subject to robustness constraints. An estimator may be robust in a number of different ways: to contamination of the dataset, to…

机器学习 · 统计学 2025-09-08 Gautam Kamath

Refined Risk Bounds for Unbounded Losses via Transductive Priors

We revisit the sequential variants of linear regression with the squared loss, classification problems with hinge loss, and logistic regression, all characterized by unbounded losses in the setup where no assumptions are made on the…

机器学习 · 统计学 2025-09-08 Jian Qian , Alexander Rakhlin , Nikita Zhivotovskiy

Inferring Change Points in High-Dimensional Regression via Approximate Message Passing

We consider the problem of localizing change points in a generalized linear model (GLM), a model that covers many widely studied problems in statistical learning including linear, logistic, and rectified linear regression. We propose a…

机器学习 · 统计学 2025-09-08 Gabriel Arpino , Xiaoqi Liu , Julia Gontarek , Ramji Venkataramanan

Survival Analysis with Adversarial Regularization

Survival Analysis (SA) models the time until an event occurs, with applications in fields like medicine, defense, finance, and aerospace. Recent research indicates that Neural Networks (NNs) can effectively capture complex data patterns in…

机器学习 · 统计学 2025-09-08 Michael Potter , Stefano Maxenti , Michael Everett

Convolutional neural networks for valid and efficient causal inference

Convolutional neural networks (CNN) have been successful in machine learning applications. Their success relies on their ability to consider space invariant local features. We consider the use of CNN to fit nuisance models in semiparametric…

机器学习 · 统计学 2025-09-08 Mohammad Ghasempour , Niloofar Moosavi , Xavier de Luna

Connections between reinforcement learning with feedback,test-time scaling, and diffusion guidance: An anthology

In this note, we reflect on several fundamental connections among widely used post-training techniques. We clarify some intimate connections and equivalences between reinforcement learning with human feedback, reinforcement learning with…

机器学习 · 统计学 2025-09-05 Yuchen Jiao , Yuxin Chen , Gen Li

An invertible generative model for forward and inverse problems

We formulate the inverse problem in a Bayesian framework and aim to train a generative model that allows us to simulate (i.e., sample from the likelihood) and do inference (i.e., sample from the posterior). We review the use of triangular…

机器学习 · 统计学 2025-09-05 Tristan van Leeuwen , Christoph Brune , Marcello Carioni

Testing for correlation between network structure and high-dimensional node covariates

In many application domains, networks are observed with node-level features. In such settings, a common problem is to assess whether or not nodal covariates are correlated with the network structure itself. Here, we present four novel…

机器学习 · 统计学 2025-09-05 Alexander Fuchs-Kreiss , Keith Levin

Asymptotic convexity of wide and shallow neural networks

For a simple model of shallow and wide neural networks, we show that the epigraph of its input-output map as a function of the network parameters approximates epigraph of a. convex function in a precise sense. This leads to a plausible…

机器学习 · 统计学 2025-09-05 Vivek Borkar , Parthe Pandit