机器学习 — Scifaro

Accelerating Hamiltonian Monte Carlo for Bayesian Inference in Neural Networks and Neural Operators

Hamiltonian Monte Carlo (HMC) is a powerful and accurate method to sample from the posterior distribution in Bayesian inference. However, HMC techniques are computationally demanding for Bayesian neural networks due to the high…

机器学习 · 统计学 2025-09-11 Ponkrshnan Thiagarajan , Tamer A. Zaki , Michael D. Shields

Statistical-Computational Trade-offs for Recursive Adaptive Partitioning Estimators

Models based on recursive adaptive partitioning such as decision trees and their ensembles are popular for high-dimensional regression as they can potentially avoid the curse of dimensionality. Because empirical risk minimization (ERM) is…

机器学习 · 统计学 2025-09-11 Yan Shuo Tan , Jason M. Klusowski , Krishnakumar Balasubramanian

PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation

We propose a likelihood-free method for comparing two distributions given samples from each, with the goal of assessing the quality of generative models. The proposed approach, PQMass, provides a statistically rigorous method for assessing…

机器学习 · 统计学 2025-09-11 Pablo Lemos , Sammy Sharief , Nikolay Malkin , Salma Salhi , Connor Stone , Laurence Perreault-Levasseur , Yashar Hezaveh

Identifying Neural Signatures from fMRI using Hybrid Principal Components Regression

Recent advances in neuroimaging analysis have enabled accurate decoding of mental state from brain activation patterns during functional magnetic resonance imaging scans. A commonly applied tool for this purpose is principal components…

机器学习 · 统计学 2025-09-10 Jared Rieck , Julia Wrobel , Joshua L. Gowin , Yue Wang , Martin Paulus , Ryan Peterson

NestGNN: A Graph Neural Network Framework Generalizing the Nested Logit Model for Travel Mode Choice

Nested logit (NL) has been commonly used for discrete choice analysis, including a wide range of applications such as travel mode choice, automobile ownership, or location decisions. However, the classical NL models are restricted by their…

机器学习 · 统计学 2025-09-10 Yuqi Zhou , Zhanhong Cheng , Lingqian Hu , Yuheng Bu , Shenhao Wang

ADHAM: Additive Deep Hazard Analysis Mixtures for Interpretable Survival Regression

Survival analysis is a fundamental tool for modeling time-to-event outcomes in healthcare. Recent advances have introduced flexible neural network approaches for improved predictive performance. However, most of these models do not provide…

机器学习 · 统计学 2025-09-10 Mert Ketenci , Vincent Jeanselme , Harry Reyes Nieva , Shalmali Joshi , Noémie Elhadad

Inexact Column Generation for Bayesian Network Structure Learning via Difference-of-Submodular Optimization

In this paper, we consider a score-based Integer Programming (IP) approach for solving the Bayesian Network Structure Learning (BNSL) problem. State-of-the-art BNSL IP formulations suffer from the exponentially large number of variables and…

机器学习 · 统计学 2025-09-10 Yiran Yang , Rui Chen

Analytic theory of dropout regularization

Dropout is a regularization technique widely used in training artificial neural networks to mitigate overfitting. It consists of dynamically deactivating subsets of the network during training to promote more robust representations. Despite…

机器学习 · 统计学 2025-09-10 Francesco Mori , Francesca Mignacco

Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning

We begin by briefly surveying some results on the convergence of the Stochastic Gradient Descent (SGD) Method, proved in a companion paper by the present authors. These results are based on viewing SGD as a version of Stochastic…

机器学习 · 统计学 2025-09-10 Rajeeva L. Karandikar , M. Vidyasagar

Learning from one graph: transductive learning guarantees via the geometry of small random worlds

Since their introduction by Kipf and Welling in $2017$, a primary use of graph convolutional networks is transductive node classification, where missing labels are inferred within a single observed graph and its feature matrix. Despite the…

机器学习 · 统计学 2025-09-09 Nils Detering , Luca Galimberti , Anastasis Kratsios , Giulia Livieri , A. Martina Neuman

Sequential Least-Squares Estimators with Fast Randomized Sketching for Linear Statistical Models

We propose a novel randomized framework for the estimation problem of large-scale linear statistical models, namely Sequential Least-Squares Estimators with Fast Randomized Sketching (SLSE-FRS), which integrates Sketch-and-Solve and…

机器学习 · 统计学 2025-09-09 Guan-Yu Chen , Xi Yang

Automated Hierarchical Graph Construction for Multi-source Electronic Health Records

Electronic Health Records (EHRs), comprising diverse clinical data such as diagnoses, medications, and laboratory results, hold great promise for translational research. EHR-derived data have advanced disease prevention, improved clinical…

机器学习 · 统计学 2025-09-09 Yinjie Wang , Doudou Zhou , Yue Liu , Junwei Lu , Tianxi Cai

Robust and Adaptive Spectral Method for Representation Multi-Task Learning with Contamination

Representation-based multi-task learning (MTL) improves efficiency by learning a shared structure across tasks, but its practical application is often hindered by contamination, outliers, or adversarial tasks. Most existing methods and…

机器学习 · 统计学 2025-09-09 Yian Huang , Yang Feng , Zhiliang Ying

MOSAIC: Minimax-Optimal Sparsity-Adaptive Inference for Change Points in Dynamic Networks

We propose a new inference framework, named MOSAIC, for change-point detection in dynamic networks with the simultaneous low-rank and sparse-change structure. We establish the minimax rate of detection boundary, which relies on the sparsity…

机器学习 · 统计学 2025-09-09 Yingying Fan , Jingyuan Liu , Jinchi Lv , Ao Sun

Additive Distributionally Robust Ranking and Selection

Ranking and selection (R&S) aims to identify the alternative with the best mean performance among $k$ simulated alternatives. The practical value of R&S depends on accurate simulation input modeling, which often suffers from the curse of…

机器学习 · 统计学 2025-09-09 Zaile Li , Yuchen Wan , L. Jeff Hong

Fisher Random Walk: Automatic Debiasing Contextual Preference Inference for Large Language Model Evaluation

Motivated by the need for rigorous and scalable evaluation of large language models, we study contextual preference inference for pairwise comparison functionals of context-dependent preference score functions across domains. Focusing on…

机器学习 · 统计学 2025-09-09 Yichi Zhang , Alexander Belloni , Ethan X. Fang , Junwei Lu , Xiaoan Xu

Cryo-EM as a Stochastic Inverse Problem

Cryo-electron microscopy (Cryo-EM) enables high-resolution imaging of biomolecules, but structural heterogeneity remains a major challenge in 3D reconstruction. Traditional methods assume a discrete set of conformations, limiting their…

机器学习 · 统计学 2025-09-09 Diego Sanchez Espinosa , Erik H Thiede , Yunan Yang

Probabilistic operator learning: generative modeling and uncertainty quantification for foundation models of differential equations

In-context operator networks (ICON) are a class of operator learning methods based on the novel architectures of foundation models. Trained on a diverse set of datasets of initial and boundary conditions paired with corresponding solutions…

机器学习 · 统计学 2025-09-09 Benjamin J. Zhang , Siting Liu , Stanley J. Osher , Markos A. Katsoulakis

Quantum-inspired probability metrics define a complete, universal space for statistical learning

Comparing probability distributions is a core challenge across the natural, social, and computational sciences. Existing methods, such as Maximum Mean Discrepancy (MMD), struggle in high-dimensional and non-compact domains. Here we…

机器学习 · 统计学 2025-09-09 Logan S. McCarty

KD$^{2}$M: A unifying framework for feature knowledge distillation

Knowledge Distillation (KD) seeks to transfer the knowledge of a teacher, towards a student neural net. This process is often done by matching the networks' predictions (i.e., their output), but, recently several works have proposed to…

机器学习 · 统计学 2025-09-09 Eduardo Fernandes Montesuma