机器学习 — Scifaro

Fairness in Criminal Justice Risk Assessments: The State of the Art

Objectives: Discussions of fairness in criminal justice risk assessments typically lack conceptual precision. Rhetoric too often substitutes for careful analysis. In this paper, we seek to clarify the tradeoffs between different kinds of…

机器学习 · 统计学 2025-07-25 Richard A. Berk , Hoda Heidari , Shahin Jabbari , Michael Kearns , Aaron Roth

Sequential Bayesian Design for Efficient Surrogate Construction in the Inversion of Darcy Flows

Inverse problems governed by partial differential equations (PDEs) play a crucial role in various fields, including computational science, image processing, and engineering. Particularly, Darcy flow equation is a fundamental equation in…

机器学习 · 统计学 2025-07-24 Hongji Wang , Hongqiao Wang , Jinyong Ying , Qingping Zhou

CoLT: The conditional localization test for assessing the accuracy of neural posterior estimates

We consider the problem of validating whether a neural posterior estimate $ q(\theta \mid x) $ is an accurate approximation to the true, unknown true posterior $ p(\theta \mid x) $. Existing methods for evaluating the quality of an NPE…

机器学习 · 统计学 2025-07-24 Tianyu Chen , Vansh Bansal , James G. Scott

The surprising strength of weak classifiers for validating neural posterior estimates

Neural Posterior Estimation (NPE) has emerged as a powerful approach for amortized Bayesian inference when the true posterior $p(\theta \mid y)$ is intractable or difficult to sample. But evaluating the accuracy of neural posterior…

机器学习 · 统计学 2025-07-24 Vansh Bansal , Tianyu Chen , James G. Scott

Fundamental limits of distributed covariance matrix estimation via a conditional strong data processing inequality

Estimating high-dimensional covariance matrices is a key task across many fields. This paper explores the theoretical limits of distributed covariance estimation in a feature-split setting, where communication between agents is constrained.…

机器学习 · 统计学 2025-07-24 Mohammad Reza Rahmani , Mohammad Hossein Yassaee , Mohammad Reza Aref

Bayesian Optimization of Robustness Measures under Input Uncertainty: A Randomized Gaussian Process Upper Confidence Bound Approach

Bayesian optimization based on the Gaussian process upper confidence bound (GP-UCB) offers a theoretical guarantee for optimizing black-box functions. In practice, however, black-box functions often involve input uncertainty. To handle such…

机器学习 · 统计学 2025-07-24 Yu Inatsu

Fast post-process Bayesian inference with Variational Sparse Bayesian Quadrature

In applied Bayesian inference scenarios, users may have access to a large number of pre-existing model evaluations, for example from maximum-a-posteriori (MAP) optimization runs. However, traditional approximate inference techniques make…

机器学习 · 统计学 2025-07-24 Chengkun Li , Grégoire Clarté , Martin Jørgensen , Luigi Acerbi

Structural Effect and Spectral Enhancement of High-Dimensional Regularized Linear Discriminant Analysis

Regularized linear discriminant analysis (RLDA) is a widely used tool for classification and dimensionality reduction, but its performance in high-dimensional scenarios is inconsistent. Existing theoretical analyses of RLDA often lack clear…

机器学习 · 统计学 2025-07-23 Yonghan Zhang , Zhangni Pu , Lu Yan , Jiang Hu

PAC Off-Policy Prediction of Contextual Bandits

This paper investigates off-policy evaluation in contextual bandits, aiming to quantify the performance of a target policy using data collected under a different and potentially unknown behavior policy. Recently, methods based on conformal…

机器学习 · 统计学 2025-07-23 Yilong Wan , Yuqiang Li , Xianyi Wu

Generative AI Models for Learning Flow Maps of Stochastic Dynamical Systems in Bounded Domains

Simulating stochastic differential equations (SDEs) in bounded domains, presents significant computational challenges due to particle exit phenomena, which requires accurate modeling of interior stochastic dynamics and boundary…

机器学习 · 统计学 2025-07-23 Minglei Yang , Yanfang Liu , Diego del-Castillo-Negrete , Yanzhao Cao , Guannan Zhang

Structural DID with ML: Theory, Simulation, and a Roadmap for Applied Research

Causal inference in observational panel data has become a central concern in economics,policy analysis,and the broader social sciences.To address the core contradiction where traditional difference-in-differences (DID) struggles with…

机器学习 · 统计学 2025-07-23 Yile Yu , Anzhi Xu , Yi Wang

Spectral Algorithms under Covariate Shift

Spectral algorithms leverage spectral regularization techniques to analyze and process data, providing a flexible framework for addressing supervised learning problems. To deepen our understanding of their performance in real-world…

机器学习 · 统计学 2025-07-23 Jun Fan , Zheng-Chu Guo , Lei Shi

Randomization Can Reduce Both Bias and Variance: A Case Study in Random Forests

We study the often overlooked phenomenon, first noted in \cite{breiman2001random}, that random forests appear to reduce bias compared to bagging. Motivated by an interesting paper by \cite{mentch2020randomization}, where the authors explain…

机器学习 · 统计学 2025-07-23 Brian Liu , Rahul Mazumder

Hypergraphs on high dimensional time series sets using signature transform

In recent decades, hypergraphs and their analysis through Topological Data Analysis (TDA) have emerged as powerful tools for understanding complex data structures. Various methods have been developed to construct hypergraphs -- referred to…

机器学习 · 统计学 2025-07-22 Rémi Vaucher , Paul Minchella

Conformal and kNN Predictive Uncertainty Quantification Algorithms in Metric Spaces

This paper introduces a framework for uncertainty quantification in regression models defined in metric spaces. Leveraging a newly defined notion of homoscedasticity, we develop a conformal prediction algorithm that offers finite-sample…

机器学习 · 统计学 2025-07-22 Gábor Lugosi , Marcos Matabuena

Missing value imputation with adversarial random forests -- MissARF

Handling missing values is a common challenge in biostatistical analyses, typically addressed by imputation methods. We propose a novel, fast, and easy-to-use imputation method called missing value imputation with adversarial random forests…

机器学习 · 统计学 2025-07-22 Pegah Golchian , Jan Kapar , David S. Watson , Marvin N. Wright

Accelerated Bayesian Optimal Experimental Design via Conditional Density Estimation and Informative Data

The Design of Experiments (DOEs) is a fundamental scientific methodology that provides researchers with systematic principles and techniques to enhance the validity, reliability, and efficiency of experimental outcomes. In this study, we…

机器学习 · 统计学 2025-07-22 Miao Huang , Hongqiao Wang , Kunyu Wu

Learning under Latent Group Sparsity via Diffusion on Networks

Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon, which has attracted broad interest from practitioners and theoreticians alike. In this work we contribute an approach to sparse…

机器学习 · 统计学 2025-07-22 Subhroshekhar Ghosh , Soumendu Sundar Mukherjee

Learning Nonlinear Causal Reductions to Explain Reinforcement Learning Policies

Why do reinforcement learning (RL) policies fail or succeed? This is a challenging question due to the complex, high-dimensional nature of agent-environment interactions. In this work, we take a causal perspective on explaining the behavior…

机器学习 · 统计学 2025-07-22 Armin Kekić , Jan Schneider , Dieter Büchler , Bernhard Schölkopf , Michel Besserve

Uncertainty Quantification for Machine Learning-Based Prediction: A Polynomial Chaos Expansion Approach for Joint Model and Input Uncertainty Propagation

Machine learning (ML) surrogate models are increasingly used in engineering analysis and design to replace computationally expensive simulation models, significantly reducing computational cost and accelerating decision-making processes.…

机器学习 · 统计学 2025-07-22 Xiaoping Du