机器学习 — Scifaro

Improving the Variance of Differentially Private Randomized Experiments through Clustering

Estimating causal effects from randomized experiments is only possible if participants are willing to disclose their potentially sensitive responses. Differential privacy, a widely used framework for ensuring an algorithms privacy…

机器学习 · 统计学 2025-05-29 Adel Javanmard , Vahab Mirrokni , Jean Pouget-Abadie

Moment Expansions of the Energy Distance

The energy distance is used to test distributional equality, and as a loss function in machine learning. While $D^2(X, Y)=0$ only when $X\sim Y$, the sensitivity to different moments is of practical importance. This work considers $D^2(X,…

机器学习 · 统计学 2025-05-28 Ian Langmore

Covariate-Adjusted Deep Causal Learning for Heterogeneous Panel Data Models

This paper studies the task of estimating heterogeneous treatment effects in causal panel data models, in the presence of covariate effects. We propose a novel Covariate-Adjusted Deep Causal Learning (CoDEAL) for panel data models, that…

机器学习 · 统计学 2025-05-28 Guanhao Zhou , Yuefeng Han , Xiufan Yu

Kernel Quantile Embeddings and Associated Probability Metrics

Embedding probability distributions into reproducing kernel Hilbert spaces (RKHS) has enabled powerful nonparametric methods such as the maximum mean discrepancy (MMD), a statistical distance with strong theoretical and computational…

机器学习 · 统计学 2025-05-28 Masha Naslidnyk , Siu Lun Chau , François-Xavier Briol , Krikamol Muandet

Differentially private ratio statistics

Ratio statistics--such as relative risk and odds ratios--play a central role in hypothesis testing, model evaluation, and decision-making across many areas of machine learning, including causal inference and fairness analysis. However,…

机器学习 · 统计学 2025-05-28 Tomer Shoham , Katrina Ligettt

Linear Bandits with Non-i.i.d. Noise

We study the linear stochastic bandit problem, relaxing the standard i.i.d. assumption on the observation noise. As an alternative to this restrictive assumption, we allow the noise terms across rounds to be sub-Gaussian but interdependent,…

机器学习 · 统计学 2025-05-28 Baptiste Abélès , Eugenio Clerico , Hamish Flynn , Gergely Neu

Sample and Map from a Single Convex Potential: Generation using Conjugate Moment Measures

The canonical approach in generative modeling is to split model fitting into two blocks: define first how to sample noise (e.g. Gaussian) and choose next what to do with it (e.g. using a single map or flows). We explore in this work an…

机器学习 · 统计学 2025-05-28 Nina Vesseron , Louis Béthune , Marco Cuturi

Comparison of the Cox proportional hazards model and Random Survival Forest algorithm for predicting patient-specific survival probabilities in clinical trial data

The Cox proportional hazards model is often used to analyze data from Randomized Controlled Trials (RCT) with time-to-event outcomes. Random survival forest (RSF) is a machine-learning algorithm known for its high predictive performance. We…

机器学习 · 统计学 2025-05-28 Ricarda Graf , Susan Todd , M. Fazil Baksh

Generalizable and Robust Spectral Method for Multi-view Representation Learning

Multi-view representation learning (MvRL) has garnered substantial attention in recent years, driven by the increasing demand for applications that can effectively process and analyze data from multiple sources. In this context, graph…

机器学习 · 统计学 2025-05-28 Amitai Yacobi , Ofir Lindenbaum , Uri Shaham

Annealing Flow Generative Models Towards Sampling High-Dimensional and Multi-Modal Distributions

Sampling from high-dimensional, multi-modal distributions remains a fundamental challenge across domains such as statistical Bayesian inference and physics-based machine learning. In this paper, we propose Annealing Flow (AF), a method…

机器学习 · 统计学 2025-05-28 Dongze Wu , Yao Xie

Fast Calculation of Feature Contributions in Boosting Trees

Recently, several fast algorithms have been proposed to decompose predicted value into Shapley values, enabling individualized feature contribution analysis in tree models. While such local decomposition offers valuable insights, it…

机器学习 · 统计学 2025-05-28 Zhongli Jiang , Min Zhang , Dabao Zhang

On a Neural Implementation of Brenier's Polar Factorization

In 1991, Brenier proved a theorem that generalizes the polar decomposition for square matrices -- factored as PSD $\times$ unitary -- to any vector field $F:\mathbb{R}^d\rightarrow \mathbb{R}^d$. The theorem, known as the polar…

机器学习 · 统计学 2025-05-28 Nina Vesseron , Marco Cuturi

Dual-Directed Algorithm Design for Efficient Pure Exploration

While experimental design often focuses on selecting the single best alternative from a finite set (e.g., in ranking and selection or best-arm identification), many pure-exploration problems pursue richer goals. Given a specific goal,…

机器学习 · 统计学 2025-05-28 Chao Qin , Wei You

Learning with Selectively Labeled Data from Multiple Decision-makers

We study the problem of classification with selectively labeled data, whose distribution may differ from the full population due to historical decision-making. We exploit the fact that in many applications historical decisions were made by…

机器学习 · 统计学 2025-05-28 Jian Chen , Zhehao Li , Xiaojie Mao

No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference

Prediction-Powered Inference (PPI) is a popular strategy for combining gold-standard and possibly noisy pseudo-labels to perform statistical estimation. Prior work has shown an asymptotic "free lunch" for PPI++, an adaptive form of PPI,…

机器学习 · 统计学 2025-05-27 Pranav Mani , Peng Xu , Zachary C. Lipton , Michael Oberst

Weighted Leave-One-Out Cross Validation

We present a weighted version of Leave-One-Out (LOO) cross-validation for estimating the Integrated Squared Error (ISE) when approximating an unknown function by a predictor that depends linearly on evaluations of the function over a finite…

机器学习 · 统计学 2025-05-27 Luc Pronzato , Maria-João Rendas

PIGPVAE: Physics-Informed Gaussian Process Variational Autoencoders

Recent advances in generative AI offer promising solutions for synthetic data generation but often rely on large datasets for effective training. To address this limitation, we propose a novel generative model that learns from limited data…

机器学习 · 统计学 2025-05-27 Michail Spitieris , Massimiliano Ruocco , Abdulmajid Murad , Alessandro Nocente

Statistical inference for Linear Stochastic Approximation with Markovian Noise

In this paper we derive non-asymptotic Berry-Esseen bounds for Polyak-Ruppert averaged iterates of the Linear Stochastic Approximation (LSA) algorithm driven by the Markovian noise. Our analysis yields $\mathcal{O}(n^{-1/4})$ convergence…

机器学习 · 统计学 2025-05-27 Sergey Samsonov , Marina Sheshukova , Eric Moulines , Alexey Naumov

Optimal Conformal Prediction under Epistemic Uncertainty

Conformal prediction (CP) is a popular frequentist framework for representing uncertainty by providing prediction sets that guarantee coverage of the true label with a user-adjustable probability. In most applications, CP operates on…

机器学习 · 统计学 2025-05-27 Alireza Javanmardi , Soroush H. Zargarbashi , Santo M. A. R. Thies , Willem Waegeman , Aleksandar Bojchevski , Eyke Hüllermeier

On the Role of Label Noise in the Feature Learning Process

Deep learning with noisy labels presents significant challenges. In this work, we theoretically characterize the role of label noise from a feature learning perspective. Specifically, we consider a signal-noise data distribution, where each…

机器学习 · 统计学 2025-05-27 Andi Han , Wei Huang , Zhanpeng Zhou , Gang Niu , Wuyang Chen , Junchi Yan , Akiko Takeda , Taiji Suzuki