机器学习 — Scifaro

Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers

The problem of learning one task using samples from another task is central to transfer learning. In this paper, we focus on answering the following question: when does combining the samples from two related tasks perform better than…

机器学习 · 统计学 2025-06-11 Fan Yang , Hongyang R. Zhang , Sen Wu , Christopher Ré , Weijie J. Su

High-Dimensional Independence Testing via Maximum and Average Distance Correlations

This paper investigates the utilization of maximum and average distance correlations for multivariate independence testing. We characterize their consistency properties in high-dimensional settings with respect to the number of marginally…

机器学习 · 统计学 2025-06-11 Cencheng Shen , Yuexiao Dong

Quickest Causal Change Point Detection by Adaptive Intervention

We propose an algorithm for change point monitoring in linear causal models that accounts for interventions. Through a special centralization technique, we can concentrate the changes arising from causal propagation across nodes into a…

机器学习 · 统计学 2025-06-10 Haijie Xu , Chen Zhang

Quantile-Optimal Policy Learning under Unmeasured Confounding

We study quantile-optimal policy learning where the goal is to find a policy whose reward distribution has the largest $\alpha$-quantile for some $\alpha \in (0, 1)$. We focus on the offline setting whose generating process involves…

机器学习 · 统计学 2025-06-10 Zhongren Chen , Siyu Chen , Zhengling Qi , Xiaohong Chen , Zhuoran Yang

Half-AVAE: Adversarial-Enhanced Factorized and Structured Encoder-Free VAE for Underdetermined Independent Component Analysis

This study advances the Variational Autoencoder (VAE) framework by addressing challenges in Independent Component Analysis (ICA) under both determined and underdetermined conditions, focusing on enhancing the independence and…

机器学习 · 统计学 2025-06-10 Yuan-Hao Wei , Yan-Jie Sun

The Currents of Conflict: Decomposing Conflict Trends with Gaussian Processes

I present a novel approach to estimating the temporal and spatial patterns of violent conflict. I show how we can use highly temporally and spatially disaggregated data on conflict events in tandem with Gaussian processes to estimate…

机器学习 · 统计学 2025-06-10 Simon P. von der Maase

Continuous Semi-Implicit Models

Semi-implicit distributions have shown great promise in variational inference and generative modeling. Hierarchical semi-implicit models, which stack multiple semi-implicit layers, enhance the expressiveness of semi-implicit distributions…

机器学习 · 统计学 2025-06-10 Longlin Yu , Jiajun Zha , Tong Yang , Tianyu Xie , Xiangyu Zhang , S. -H. Gary Chan , Cheng Zhang

Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations

Learning distribution families over $\mathbb{R}^d$ is a fundamental problem in unsupervised learning and statistics. A central question in this setting is whether a given family of distributions possesses sufficient structure to be (at…

机器学习 · 统计学 2025-06-10 Arefe Boushehrian , Amir Najafi

Direct Fisher Score Estimation for Likelihood Maximization

We study the problem of likelihood maximization when the likelihood function is intractable but model simulations are readily available. We propose a sequential, gradient-based optimization method that directly models the Fisher score based…

机器学习 · 统计学 2025-06-10 Sherman Khoo , Yakun Wang , Song Liu , Mark Beaumont

Equilibrium Distribution for t-Distributed Stochastic Neighbor Embedding with Generalized Kernels

T-distributed stochastic neighbor embedding (t-SNE) is a well-known algorithm for visualizing high-dimensional data by finding low-dimensional representations. In this paper, we study the convergence of t-SNE with generalized kernels and…

机器学习 · 统计学 2025-06-10 Yi Gu

InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference

Inferring Gene Regulatory Networks (GRNs) from gene expression data is crucial for understanding biological processes. While supervised models are reported to achieve high performance for this task, they rely on costly ground truth (GT)…

机器学习 · 统计学 2025-06-10 Tianyu Cui , Song-Jun Xu , Artem Moskalev , Shuwei Li , Tommaso Mansi , Mangal Prakash , Rui Liao

On the kernel learning problem

The classical kernel ridge regression problem aims to find the best fit for the output $Y$ as a function of the input data $X\in \mathbb{R}^d$, with a fixed choice of regularization term imposed by a given choice of a reproducing kernel…

机器学习 · 统计学 2025-06-10 Yang Li , Feng Ruan

Scalable Sobolev IPM for Probability Measures on a Graph

We investigate the Sobolev IPM problem for probability measures supported on a graph metric space. Sobolev IPM is an important instance of integral probability metrics (IPM), and is obtained by constraining a critic function within a unit…

机器学习 · 统计学 2025-06-10 Tam Le , Truyen Nguyen , Hideitsu Hino , Kenji Fukumizu

Prediction-Enhanced Monte Carlo: A Machine Learning View on Control Variate

For many complex simulation tasks spanning areas such as healthcare, engineering, and finance, Monte Carlo (MC) methods are invaluable due to their unbiased estimates and precise error quantification. Nevertheless, Monte Carlo simulations…

机器学习 · 统计学 2025-06-10 Fengpei Li , Haoxian Chen , Jiahe Lin , Arkin Gupta , Xiaowei Tan , Honglei Zhao , Gang Xu , Yuriy Nevmyvaka , Agostino Capponi , Henry Lam

Certifiably Robust Model Evaluation in Federated Learning under Meta-Distributional Shifts

We address the challenge of certifying the performance of a federated learning model on an unseen target network using only measurements from the source network that trained the model. Specifically, consider a source network "A" with $K$…

机器学习 · 统计学 2025-06-10 Amir Najafi , Samin Mahdizadeh Sani , Farzan Farnia

Theoretical Limitations of Ensembles in the Age of Overparameterization

Classic ensembles generalize better than any single component model. In contrast, recent empirical studies find that modern ensembles of (overparameterized) neural networks may not provide any inherent generalization advantage over single…

机器学习 · 统计学 2025-06-10 Niclas Dern , John P. Cunningham , Geoff Pleiss

Generalization and Robustness of the Tilted Empirical Risk

The generalization error (risk) of a supervised statistical learning algorithm quantifies its prediction ability on previously unseen data. Inspired by exponential tilting, \citet{li2020tilted} proposed the {\it tilted empirical risk} (TER)…

机器学习 · 统计学 2025-06-10 Gholamali Aminian , Amir R. Asadi , Tian Li , Ahmad Beirami , Gesine Reinert , Samuel N. Cohen

Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling

Conventional diffusion models typically relies on a fixed forward process, which implicitly defines complex marginal distributions over latent variables. This can often complicate the reverse process' task in learning generative…

机器学习 · 统计学 2025-06-10 Grigory Bartosh , Dmitry Vetrov , Christian A. Naesseth

FairICP: Encouraging Equalized Odds via Inverse Conditional Permutation

$\textit{Equalized odds}$, an important notion of algorithmic fairness, aims to ensure that sensitive variables, such as race and gender, do not unfairly influence the algorithm's prediction when conditioning on the true outcome. Despite…

机器学习 · 统计学 2025-06-10 Yuheng Lai , Leying Guan

Nonparametric Modern Hopfield Models

We present a nonparametric interpretation for deep learning compatible modern Hopfield models and utilize this new perspective to debut efficient variants. Our key contribution stems from interpreting the memory storage and retrieval…

机器学习 · 统计学 2025-06-10 Jerry Yao-Chieh Hu , Bo-Yu Chen , Dennis Wu , Feng Ruan , Han Liu