机器学习 — Scifaro

Gaussian Process Assisted Meta-learning for Image Classification and Object Detection Models

Collecting operationally realistic data to inform machine learning models can be costly. Before collecting new data, it is helpful to understand where a model is deficient. For example, object detectors trained on images of rare objects may…

机器学习 · 统计学 2025-12-24 Anna R. Flowers , Christopher T. Franck , Robert B. Gramacy , Justin A. Krometis

Quasiprobabilistic Density Ratio Estimation with a Reverse Engineered Classification Loss Function

We consider a generalization of the classifier-based density-ratio estimation task to a quasiprobabilistic setting where probability densities can be negative. The problem with most loss functions used for this task is that they implicitly…

机器学习 · 统计学 2025-12-24 Matthew Drnevich , Stephen Jiggins , Kyle Cranmer

Robust Causal Directionality Inference in Quantum Inference under MNAR Observation and High-Dimensional Noise

In quantum mechanics, observation actively shapes the system, paralleling the statistical notion of Missing Not At Random (MNAR). This study introduces a unified framework for \textbf{robust causal directionality inference} in quantum…

机器学习 · 统计学 2025-12-24 Joonsung Kang

One Permutation Is All You Need: Fast, Reliable Variable Importance and Model Stress-Testing

Reliable estimation of feature contributions in machine learning models is essential for trust, transparency and regulatory compliance, especially when models are proprietary or otherwise operate as black boxes. While permutation-based…

机器学习 · 统计学 2025-12-24 Albert Dorador

A Bayesian cluster validity index

Selecting the appropriate number of clusters is a critical step in applying clustering algorithms. To assist in this process, various cluster validity indices (CVIs) have been developed. These indices are designed to identify the optimal…

机器学习 · 统计学 2025-12-24 Nathakhun Wiroonsri , Onthada Preedasawakul

Boosted Control Functions: Distribution generalization and invariance in confounded models

Modern machine learning methods and the availability of large-scale data have significantly advanced our ability to predict target quantities from large sets of covariates. However, these methods often struggle under distributional shifts,…

机器学习 · 统计学 2025-12-24 Nicola Gnecco , Jonas Peters , Sebastian Engelke , Niklas Pfister

A correlation-based fuzzy cluster validity index with secondary options detector

The optimal number of clusters is one of the main concerns when applying cluster analysis. Several cluster validity indexes have been introduced to address this problem. However, in some situations, there is more than one option that can be…

机器学习 · 统计学 2025-12-24 Nathakhun Wiroonsri , Onthada Preedasawakul

On Conditional Stochastic Interpolation for Generative Nonlinear Sufficient Dimension Reduction

Identifying low-dimensional sufficient structures in nonlinear sufficient dimension reduction (SDR) has long been a fundamental yet challenging problem. Most existing methods lack theoretical guarantees of exhaustiveness in identifying…

机器学习 · 统计学 2025-12-23 Shuntuo Xu , Zhou Yu , Jian Huang

Causal Inference as Distribution Adaptation: Optimizing ATE Risk under Propensity Uncertainty

Standard approaches to causal inference, such as Outcome Regression and Inverse Probability Weighted Regression Adjustment (IPWRA), are typically derived through the lens of missing data imputation and identification theory. In this work,…

机器学习 · 统计学 2025-12-23 Ashley Zhang

Sampling from multimodal distributions with warm starts: Non-asymptotic bounds for the Reweighted Annealed Leap-Point Sampler

Sampling from multimodal distributions is a central challenge in Bayesian inference and machine learning. In light of hardness results for sampling -- classical MCMC methods, even with tempering, can suffer from exponential mixing times --…

机器学习 · 统计学 2025-12-23 Holden Lee , Matheau Santana-Gijzen

Disentangled representations via score-based variational autoencoders

We present the Score-based Autoencoder for Multiscale Inference (SAMI), a method for unsupervised representation learning that combines the theoretical frameworks of diffusion models and VAEs. By unifying their respective evidence lower…

机器学习 · 统计学 2025-12-23 Benjamin S. H. Lyo , Eero P. Simoncelli , Cristina Savin

Universality of high-dimensional scaling limits of stochastic gradient descent

We consider statistical tasks in high dimensions whose loss depends on the data only through its projection into a fixed-dimensional subspace spanned by the parameter vectors and certain ground truth vectors. This includes classifying…

机器学习 · 统计学 2025-12-23 Reza Gheissari , Aukosh Jagannath

Ensuring Calibration Robustness in Split Conformal Prediction Under Adversarial Attacks

Conformal prediction (CP) provides distribution-free, finite-sample coverage guarantees but critically relies on exchangeability, a condition often violated under distribution shift. We study the robustness of split conformal prediction…

机器学习 · 统计学 2025-12-23 Xunlei Qian , Yue Xing

Density estimation via mixture discrepancy and moments

With the aim of generalizing histogram statistics to higher dimensional cases, density estimation via discrepancy based sequential partition (DSP) has been proposed to learn an adaptive piecewise constant approximation defined on a binary…

机器学习 · 统计学 2025-12-23 Zhengyang Lei , Lirong Qu , Sihong Shao , Yunfeng Xiong

Theoretical Convergence Guarantees for Variational Autoencoders

Variational Autoencoders (VAE) are popular generative models used to sample from complex data distributions. Despite their empirical success in various machine learning tasks, significant gaps remain in understanding their theoretical…

机器学习 · 统计学 2025-12-23 Sobihan Surendran , Antoine Godichon-Baggioni , Sylvain Le Corff

Imputation Uncertainty in Interpretable Machine Learning Methods

In real data, missing values occur frequently, which affects the interpretation with interpretable machine learning (IML) methods. Recent work considers bias and shows that model explanations may differ between imputation methods, while…

机器学习 · 统计学 2025-12-22 Pegah Golchian , Marvin N. Wright

Generative Multi-Objective Bayesian Optimization with Scalable Batch Evaluations for Sample-Efficient De Novo Molecular Design

Designing molecules that must satisfy multiple, often conflicting objectives is a central challenge in molecular discovery. The enormous size of chemical space and the cost of high-fidelity simulations have driven the development of machine…

机器学习 · 统计学 2025-12-22 Madhav R. Muthyala , Farshud Sorourifar , Tianhong Tan , You Peng , Joel A. Paulson

Perfect reconstruction of sparse signals using nonconvexity control and one-step RSB message passing

We consider sparse signal reconstruction via minimization of the smoothly clipped absolute deviation (SCAD) penalty, and develop one-step replica-symmetry-breaking (1RSB) extensions of approximate message passing (AMP), termed 1RSB-AMP.…

机器学习 · 统计学 2025-12-22 Xiaosi Gu , Ayaka Sakata , Tomoyuki Obuchi

Generalized infinite dimensional Alpha-Procrustes based geometries

This work extends the recently introduced Alpha-Procrustes family of Riemannian metrics for symmetric positive definite (SPD) matrices by incorporating generalized versions of the Bures-Wasserstein (GBW), Log-Euclidean, and Wasserstein…

机器学习 · 统计学 2025-12-22 Salvish Goomanee , Andi Han , Pratik Jawanpuria , Bamdev Mishra

Quantifying Uncertainty in the Presence of Distribution Shifts

Neural networks make accurate predictions but often fail to provide reliable uncertainty estimates, especially under covariate distribution shifts between training and testing. To address this problem, we propose a Bayesian framework for…

机器学习 · 统计学 2025-12-22 Yuli Slavutsky , David M. Blei