机器学习 — Scifaro

Privacy utility trade offs for parameter estimation in degree heterogeneous higher order networks

In sensitive applications involving relational datasets, protecting information about individual links from adversarial queries is of paramount importance. In many such settings, the available data are summarized solely through the degrees…

机器学习 · 统计学 2026-02-05 Bibhabasu Mandal , Sagnik Nandy

Byzantine Machine Learning: MultiKrum and an optimal notion of robustness

Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean…

机器学习 · 统计学 2026-02-05 Gilles Bareilles , Wassim Bouaziz , Julien Fageot , El-Mahdi El-Mhamdi

A Hitchhiker's Guide to Poisson Gradient Estimation

Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: Exponential Arrival Time (EAT) simulation…

机器学习 · 统计学 2026-02-05 Michael Ibrahim , Hanqi Zhao , Eli Sennesh , Zhi Li , Anqi Wu , Jacob L. Yates , Chengrui Li , Hadi Vafaii

Transcendental Regularization of Finite Mixtures:Theoretical Guarantees and Practical Limitations

Finite mixture models are widely used for unsupervised learning, but maximum likelihood estimation via EM suffers from degeneracy as components collapse. We introduce transcendental regularization, a penalized likelihood framework with…

机器学习 · 统计学 2026-02-05 Ernest Fokoué

Privacy Amplification by Missing Data

Privacy preservation is a fundamental requirement in many high-stakes domains such as medicine and finance, where sensitive personal data must be analyzed without compromising individual confidentiality. At the same time, these applications…

机器学习 · 统计学 2026-02-05 Simon Roburin , Rafaël Pinot , Erwan Scornet

It's all In the (Exponential) Family: An Equivalence between Maximum Likelihood Estimation and Control Variates for Sketching Algorithms

Maximum likelihood estimators (MLE) and control variate estimators (CVE) have been used in conjunction with known information across sketching algorithms and applications in machine learning. We prove that under certain conditions in an…

机器学习 · 统计学 2026-02-05 Keegan Kang , Kerong Wang , Ding Zhang , Rameshwar Pratap , Bhisham Dev Verma , Benedict H. W. Wong

Unified Unbiased Variance Estimation for Maximum Mean Discrepancy: Robust Finite-Sample Performance with Imbalanced Data and Exact Acceleration under Null and Alternative Hypotheses

The maximum mean discrepancy (MMD) is a kernel-based nonparametric statistic for two-sample testing, whose inferential accuracy depends critically on variance characterization. Existing work provides various finite-sample estimators of the…

机器学习 · 统计学 2026-02-05 Shijie Zhong , Yikun Yang , Da Gong , Jiangfeng Fu

Inference-Time Alignment for Diffusion Models via Variationally Stable Doob's Matching

Inference-time alignment for diffusion models aims to adapt a pre-trained reference diffusion model toward a target distribution without retraining the reference score network, thereby preserving the generative capacity of the reference…

机器学习 · 统计学 2026-02-05 Jinyuan Chang , Chenguang Duan , Yuling Jiao , Yi Xu , Jerry Zhijian Yang

Learning from Neighbors with PHIBP: Predicting Infectious Disease Dynamics in Data-Sparse Environments

Modeling sparse count data, which arise across numerous scientific fields, presents significant statistical challenges. This chapter addresses these challenges in the context of infectious disease prediction, with a focus on predicting…

机器学习 · 统计学 2026-02-05 Edwin Fong , Lancelot F. James , Juho Lee

A Novel Framework Using Variational Inference with Normalizing Flows to Train Transport Reversible Jump Proposals

We propose a unified framework that employs variational inference (VI) with (conditional) normalizing flows (NFs) to train both between-model and within-model proposals for reversible jump Markov chain Monte Carlo, enabling efficient…

机器学习 · 统计学 2026-02-05 Pingping Yin , Xiyun Jiao

Sharpness of Minima in Deep Matrix Factorization

Understanding the geometry of the loss landscape near a minimum is key to explaining the implicit bias of gradient-based methods in non-convex optimization problems such as deep neural network training and deep matrix factorization. A…

机器学习 · 统计学 2026-02-05 Anil Kamber , Rahul Parhi

Singleton-Optimized Conformal Prediction

Conformal prediction can be used to construct prediction sets that cover the true outcome with a desired probability, but can sometimes lead to large prediction sets that are costly in practice. The most useful outcome is a singleton…

机器学习 · 统计学 2026-02-05 Tao Wang , Yan Sun , Edgar Dobriban

DP-SPRT: Differentially Private Sequential Probability Ratio Tests

We revisit Wald's celebrated Sequential Probability Ratio Test for sequential tests of two simple hypotheses, under privacy constraints. We propose DP-SPRT, a wrapper that can be calibrated to achieve desired error probabilities and privacy…

机器学习 · 统计学 2026-02-05 Thomas Michel , Debabrota Basu , Emilie Kaufmann

Scalable Deep Basis Kernel Gaussian Processes

Learning expressive kernels while retaining tractable inference remains a central challenge in scaling Gaussian processes (GPs) to large and complex datasets. We propose a scalable GP regressor based on deep basis kernels (DBKs). Our DBK is…

机器学习 · 统计学 2026-02-05 Yunqin Zhu , Henry Shaowu Yuchi , Yao Xie

Predictive Low Rank Matrix Learning under Partial Observations: Mixed-Projection ADMM

We study the problem of learning a partially observed matrix under the low rank assumption in the presence of fully observed side information that depends linearly on the true underlying matrix. This problem consists of an important…

机器学习 · 统计学 2026-02-05 Dimitris Bertsimas , Nicholas A. G. Johnson

P-Tensors: a General Formalism for Constructing Higher Order Message Passing Networks

Several recent papers have proposed increasing the expressive power of graph neural networks by exploiting subgraphs or other topological structures. In parallel, researchers have investigated higher order permutation equivariant networks.…

机器学习 · 统计学 2026-02-05 Andrew Hands , Tianyi Sun , Risi Kondor

Preference-based Conditional Treatment Effects and Policy Learning

We introduce a new preference-based framework for conditional treatment effect estimation and policy learning, built on the Conditional Preference-based Treatment Effect (CPTE). CPTE requires only that outcomes be ranked under a preference…

机器学习 · 统计学 2026-02-04 Dovid Parnas , Mathieu Even , Julie Josse , Uri Shalit

Fast Sampling for Flows and Diffusions with Lazy and Point Mass Stochastic Interpolants

Stochastic interpolants unify flows and diffusions, popular generative modeling frameworks. A primary hyperparameter in these methods is the interpolation schedule that determines how to bridge a standard Gaussian base measure to an…

机器学习 · 统计学 2026-02-04 Gabriel Damsholt , Jes Frellsen , Susanne Ditlevsen

Improved Analysis of the Accelerated Noisy Power Method with Applications to Decentralized PCA

We analyze the Accelerated Noisy Power Method, an algorithm for Principal Component Analysis in the setting where only inexact matrix-vector products are available, which can arise for instance in decentralized PCA. While previous works…

机器学习 · 统计学 2026-02-04 Pierre Aguié , Mathieu Even , Laurent Massoulié

Generator-based Graph Generation via Heat Diffusion

Graph generative modelling has become an essential task due to the wide range of applications in chemistry, biology, social networks, and knowledge representation. In this work, we propose a novel framework for generating graphs by adapting…

机器学习 · 统计学 2026-02-04 Anthony Stephenson , Ian Gallagher , Christopher Nemeth