机器学习 — Scifaro

A Scalable Crawling Algorithm Utilizing Noisy Change-Indicating Signals

Web refresh crawling is the problem of keeping a cache of web pages fresh, that is, having the most recent copy available when a page is requested, given a limited bandwidth available to the crawler. Under the assumption that the change and…

机器学习 · 统计学 2025-03-24 Róbert Busa-Fekete , Julian Zimmert , András György , Linhai Qiu , Tzu-Wei Sung , Hao Shen , Hyomin Choi , Sharmila Subramaniam , Li Xiao

Semi-Implicit Functional Gradient Flow for Efficient Sampling

Particle-based variational inference methods (ParVIs) use nonparametric variational families represented by particles to approximate the target distribution according to the kernelized Wasserstein gradient flow for the Kullback-Leibler (KL)…

机器学习 · 统计学 2025-03-24 Shiyue Zhang , Ziheng Cheng , Cheng Zhang

Calibrated Computation-Aware Gaussian Processes

Gaussian processes are notorious for scaling cubically with the size of the training set, preventing application to very large regression problems. Computation-aware Gaussian processes (CAGPs) tackle this scaling issue by exploiting…

机器学习 · 统计学 2025-03-24 Disha Hegde , Mohamed Adil , Jon Cockayne

Network reconstruction via the minimum description length principle

A fundamental problem associated with the task of network reconstruction from dynamical or behavioral data consists in determining the most appropriate model complexity in a manner that prevents overfitting, and produces an inferred network…

机器学习 · 统计学 2025-03-24 Tiago P. Peixoto

Separation capacity of linear reservoirs with random connectivity matrix

A natural hypothesis for the success of reservoir computing in generic tasks is the ability of the untrained reservoir to map different input time series to separable reservoir states - a property we term separation capacity. We provide a…

机器学习 · 统计学 2025-03-24 Youness Boutaib

Interpretable Neural Causal Models with TRAM-DAGs

The ultimate goal of most scientific studies is to understand the underlying causal mechanism between the involved variables. Structural causal models (SCMs) are widely used to represent such causal mechanisms. Given an SCM, causal queries…

机器学习 · 统计学 2025-03-21 Beate Sick , Oliver Dürr

Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors

We establish in-expectation and tail bounds on the generalization error of representation learning type algorithms. The bounds are in terms of the relative entropy between the distribution of the representations extracted from the training…

机器学习 · 统计学 2025-03-21 Milad Sefidgaran , Abdellatif Zaidi , Piotr Krasnowski

Robustness of Nonlinear Representation Learning

We study the problem of unsupervised representation learning in slightly misspecified settings, and thus formalize the study of robustness of nonlinear representation learning. We focus on the case where the mixing is close to a local…

机器学习 · 统计学 2025-03-20 Simon Buchholz , Bernhard Schölkopf

Online federated learning framework for classification

In this paper, we develop a novel online federated learning framework for classification, designed to handle streaming data from multiple clients while ensuring data privacy and computational efficiency. Our method leverages the generalized…

机器学习 · 统计学 2025-03-20 Wenxing Guo , Jinhan Xie , Jianya Lu , Bei jiang , Hongsheng Dai , Linglong Kong

Nonlinear Bayesian Update via Ensemble Kernel Regression with Clustering and Subsampling

Nonlinear Bayesian update for a prior ensemble is proposed to extend traditional ensemble Kalman filtering to settings characterized by non-Gaussian priors and nonlinear measurement operators. In this framework, the observed component is…

机器学习 · 统计学 2025-03-20 Yoonsang Lee

The Hardness of Validating Observational Studies with Experimental Data

Observational data is often readily available in large quantities, but can lead to biased causal effect estimates due to the presence of unobserved confounding. Recent works attempt to remove this bias by supplementing observational data…

机器学习 · 统计学 2025-03-20 Jake Fawkes , Michael O'Riordan , Athanasios Vlontzos , Oriol Corcoll , Ciarán Mark Gilligan-Lee

Variational Autoencoded Multivariate Spatial Fay-Herriot Models

Small area estimation models are essential for estimating population characteristics in regions with limited sample sizes, thereby supporting policy decisions, demographic studies, and resource allocation, among other use cases. The spatial…

机器学习 · 统计学 2025-03-20 Zhenhua Wang , Paul A. Parker , Scott H. Holan

Efficient Optimization Algorithms for Linear Adversarial Training

Adversarial training can be used to learn models that are robust against perturbations. For linear models, it can be formulated as a convex optimization problem. Compared to methods proposed in the context of deep learning, leveraging the…

机器学习 · 统计学 2025-03-20 Antônio H. RIbeiro , Thomas B. Schön , Dave Zahariah , Francis Bach

Implicit Bias of Mirror Flow for Shallow Neural Networks in Univariate Regression

We examine the implicit bias of mirror flow in univariate least squares error regression with wide and shallow neural networks. For a broad class of potential functions, we show that mirror flow exhibits lazy training and has the same…

机器学习 · 统计学 2025-03-20 Shuang Liang , Guido Montúfar

Bayesian Circular Regression with von Mises Quasi-Processes

The need for regression models to predict circular values arises in many scientific fields. In this work we explore a family of expressive and interpretable distributions over circle-valued random functions related to Gaussian processes…

机器学习 · 统计学 2025-03-20 Yarden Cohen , Alexandre Khae Wu Navarro , Jes Frellsen , Richard E. Turner , Raziel Riemer , Ari Pakman

A Metric-based Principal Curve Approach for Learning One-dimensional Manifold

Principal curve is a well-known statistical method oriented in manifold learning using concepts from differential geometry. In this paper, we propose a novel metric-based principal curve (MPC) method that learns one-dimensional manifold of…

机器学习 · 统计学 2025-03-20 Eliuvish Cuicizion

Weighted-Sum of Gaussian Process Latent Variable Models

This work develops a Bayesian non-parametric approach to signal separation where the signals may vary according to latent variables. Our key contribution is to augment Gaussian Process Latent Variable Models (GPLVMs) for the case where each…

机器学习 · 统计学 2025-03-20 James Odgers , Ruby Sedgwick , Chrysoula Kappatou , Ruth Misener , Sarah Filippi

Empirical risk minimization algorithm for multiclass classification of S.D.E. paths

We address the multiclass classification problem for stochastic diffusion paths, assuming that the classes are distinguished by their drift functions, while the diffusion coefficient remains common across all classes. In this setting, we…

机器学习 · 统计学 2025-03-19 Christophe Denis , Eddy Ella Mintsa

Optimizing ML Training with Metagradient Descent

A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based…

机器学习 · 统计学 2025-03-19 Logan Engstrom , Andrew Ilyas , Benjamin Chen , Axel Feldmann , William Moses , Aleksander Madry

Bayesian Kernel Regression for Functional Data

In supervised learning, the output variable to be predicted is often represented as a function, such as a spectrum or probability distribution. Despite its importance, functional output regression remains relatively unexplored. In this…

机器学习 · 统计学 2025-03-19 Minoru Kusaba , Megumi Iwayama , Ryo Yoshida