机器学习 — Scifaro

A Criterion for Extending Continuous-Mixture Identifiability Results

Mixture distributions provide a versatile and widely used framework for modeling random phenomena, and are particularly well-suited to the analysis of geoscientific processes and their attendant risks to society. For continuous mixtures of…

机器学习 · 统计学 2025-06-18 Michael R. Powers , Jiaxin Xu

Low-dimensional adaptation of diffusion models: Convergence in total variation

This paper investigates how diffusion generative models leverage (unknown) low-dimensional structure to accelerate sampling. Focusing on two mainstream samplers -- the denoising diffusion implicit model (DDIM) and the denoising diffusion…

机器学习 · 统计学 2025-06-18 Jiadong Liang , Zhihan Huang , Yuxin Chen

Variational Bayesian Bow tie Neural Networks with Shrinkage

Despite the dominant role of deep models in machine learning, limitations persist, including overconfident predictions, susceptibility to adversarial attacks, and underestimation of variability in predictions. The Bayesian paradigm provides…

机器学习 · 统计学 2025-06-18 Alisa Sheinkman , Sara Wade

Flat Posterior Does Matter For Bayesian Model Averaging

Bayesian neural networks (BNNs) estimate the posterior distribution of model parameters and utilize posterior samples for Bayesian Model Averaging (BMA) in prediction. However, despite the crucial role of flatness in the loss landscape in…

机器学习 · 统计学 2025-06-18 Sungjun Lim , Jeyoon Yeom , Sooyon Kim , Hoyoon Byun , Jinho Kang , Yohan Jung , Jiyoung Jung , Kyungwoo Song

Generalized Random Forests using Fixed-Point Trees

We propose a computationally efficient alternative to generalized random forests (GRFs) for estimating heterogeneous effects in large dimensions. While GRFs rely on a gradient-based splitting criterion, which in large dimensions is…

机器学习 · 统计学 2025-06-18 David Fleischer , David A. Stephens , Archer Y. Yang

Understanding Learning Invariance in Deep Linear Networks

Equivariant and invariant machine learning models exploit symmetries and structural patterns in data to improve sample efficiency. While empirical studies suggest that data-driven methods such as regularization and data augmentation can…

机器学习 · 统计学 2025-06-17 Hao Duan , Guido Montúfar

Exploiting the Exact Denoising Posterior Score in Training-Free Guidance of Diffusion Models

The success of diffusion models has driven interest in performing conditional sampling via training-free guidance of the denoising process to solve image restoration and other inverse problems. A popular class of methods, based on Diffusion…

机器学习 · 统计学 2025-06-17 Gregory Bellchambers

Fair Bayesian Model-Based Clustering

Fair clustering has become a socially significant task with the advancement of machine learning technologies and the growing demand for trustworthy AI. Group fairness ensures that the proportions of each sensitive group are similar in all…

机器学习 · 统计学 2025-06-17 Jihu Lee , Kunwoong Kim , Yongdai Kim

Dependent Randomized Rounding for Budget Constrained Experimental Design

Policymakers in resource-constrained settings require experimental designs that satisfy strict budget limits while ensuring precise estimation of treatment effects. We propose a framework that applies a dependent randomized rounding…

机器学习 · 统计学 2025-06-17 Khurram Yamin , Edward Kennedy , Bryan Wilder

A Transfer Learning Framework for Multilayer Networks via Model Averaging

Link prediction in multilayer networks is a key challenge in applications such as recommendation systems and protein-protein interaction prediction. While many techniques have been developed, most rely on assumptions about shared structures…

机器学习 · 统计学 2025-06-17 Yongqin Qiu , Xinyu Zhang

On the existence of consistent adversarial attacks in high-dimensional linear classification

What fundamentally distinguishes an adversarial attack from a misclassification due to limited model expressivity or finite data? In this work, we investigate this question in the setting of high-dimensional binary classification, where…

机器学习 · 统计学 2025-06-17 Matteo Vilucchio , Lenka Zdeborová , Bruno Loureiro

Theoretical Tensions in RLHF: Reconciling Empirical Success with Inconsistencies in Social Choice Theory

Despite its empirical success, Reinforcement Learning from Human Feedback (RLHF) has been shown to violate almost all the fundamental axioms in social choice theory -- such as majority consistency, pairwise majority consistency, and…

机器学习 · 统计学 2025-06-17 Jiancong Xiao , Zhekun Shi , Kaizhao Liu , Qi Long , Weijie J. Su

Temporal cross-validation impacts multivariate time series subsequence anomaly detection evaluation

Evaluating anomaly detection in multivariate time series (MTS) requires careful consideration of temporal dependencies, particularly when detecting subsequence anomalies common in fault detection scenarios. While time series…

机器学习 · 统计学 2025-06-17 Steven C. Hespeler , Pablo Moriano , Mingyan Li , Samuel C. Hollifield

Survey on Algorithms for multi-index models

We review the literature on algorithms for estimating the index space in a multi-index model. The primary focus is on computationally efficient (polynomial-time) algorithms in Gaussian space, the assumptions under which consistency is…

机器学习 · 统计学 2025-06-17 Joan Bruna , Daniel Hsu

Improved Online Confidence Bounds for Multinomial Logistic Bandits

In this paper, we propose an improved online confidence bound for multinomial logistic (MNL) models and apply this result to MNL bandits, achieving variance-dependent optimal regret. Recently, Lee & Oh (2024) established an online…

机器学习 · 统计学 2025-06-17 Joongkyu Lee , Min-hwan Oh

Robust Conformal Outlier Detection under Contaminated Reference Data

Conformal prediction is a flexible framework for calibrating machine learning predictions, providing distribution-free statistical guarantees. In outlier detection, this calibration relies on a reference set of labeled inlier data to…

机器学习 · 统计学 2025-06-17 Meshi Bashari , Matteo Sesia , Yaniv Romano

Counterfactual Uncertainty Quantification of Factual Estimand of Efficacy from Before-and-After Treatment Repeated Measures Randomized Controlled Trials

This article quantifies the uncertainty reduction achievable for \textit{counterfactual} estimand, and cautions against potential bias when the estimand uses Digital Twins. Posed by Neyman (1923a) who showed unbiased \textit{point…

机器学习 · 统计学 2025-06-17 Xingya Wang , Yang Han , Yushi Liu , Szu-Yu Tang , Jason C. Hsu

Improved Regret of Linear Ensemble Sampling

In this work, we close the fundamental gap of theory and practice by providing an improved regret bound for linear ensemble sampling. We prove that with an ensemble size logarithmic in $T$, linear ensemble sampling can achieve a frequentist…

机器学习 · 统计学 2025-06-17 Harin Lee , Min-hwan Oh

Sliding-Window Thompson Sampling for Non-Stationary Settings

Non-stationary multi-armed bandits (NS-MABs) model sequential decision-making problems in which the expected rewards of a set of actions, a.k.a.~arms, evolve over time. In this paper, we fill a gap in the literature by providing a novel…

机器学习 · 统计学 2025-06-17 Marco Fiandri , Alberto Maria Metelli , Francesco Trovò

Amortized Bayesian Multilevel Models

Multilevel models (MLMs) are a central building block of the Bayesian workflow. They enable joint, interpretable modeling of data across hierarchical levels and provide a fully probabilistic quantification of uncertainty. Despite their…

机器学习 · 统计学 2025-06-17 Daniel Habermann , Marvin Schmitt , Lars Kühmichel , Andreas Bulling , Stefan T. Radev , Paul-Christian Bürkner