机器学习
Objectives: Discussions of fairness in criminal justice risk assessments typically lack conceptual precision. Rhetoric too often substitutes for careful analysis. In this paper, we seek to clarify the tradeoffs between different kinds of…
Inverse problems governed by partial differential equations (PDEs) play a crucial role in various fields, including computational science, image processing, and engineering. Particularly, Darcy flow equation is a fundamental equation in…
We consider the problem of validating whether a neural posterior estimate \( q(\theta \mid x) \) is an accurate approximation to the true, unknown true posterior \( p(\theta \mid x) \). Existing methods for evaluating the quality of an NPE…
Neural Posterior Estimation (NPE) has emerged as a powerful approach for amortized Bayesian inference when the true posterior $p(\theta \mid y)$ is intractable or difficult to sample. But evaluating the accuracy of neural posterior…
Estimating high-dimensional covariance matrices is a key task across many fields. This paper explores the theoretical limits of distributed covariance estimation in a feature-split setting, where communication between agents is constrained.…
Bayesian optimization based on the Gaussian process upper confidence bound (GP-UCB) offers a theoretical guarantee for optimizing black-box functions. In practice, however, black-box functions often involve input uncertainty. To handle such…
In applied Bayesian inference scenarios, users may have access to a large number of pre-existing model evaluations, for example from maximum-a-posteriori (MAP) optimization runs. However, traditional approximate inference techniques make…
Regularized linear discriminant analysis (RLDA) is a widely used tool for classification and dimensionality reduction, but its performance in high-dimensional scenarios is inconsistent. Existing theoretical analyses of RLDA often lack clear…
This paper investigates off-policy evaluation in contextual bandits, aiming to quantify the performance of a target policy using data collected under a different and potentially unknown behavior policy. Recently, methods based on conformal…
Simulating stochastic differential equations (SDEs) in bounded domains, presents significant computational challenges due to particle exit phenomena, which requires accurate modeling of interior stochastic dynamics and boundary…
Causal inference in observational panel data has become a central concern in economics,policy analysis,and the broader social sciences.To address the core contradiction where traditional difference-in-differences (DID) struggles with…
Spectral algorithms leverage spectral regularization techniques to analyze and process data, providing a flexible framework for addressing supervised learning problems. To deepen our understanding of their performance in real-world…
We study the often overlooked phenomenon, first noted in \cite{breiman2001random}, that random forests appear to reduce bias compared to bagging. Motivated by an interesting paper by \cite{mentch2020randomization}, where the authors explain…
In recent decades, hypergraphs and their analysis through Topological Data Analysis (TDA) have emerged as powerful tools for understanding complex data structures. Various methods have been developed to construct hypergraphs -- referred to…
This paper introduces a framework for uncertainty quantification in regression models defined in metric spaces. Leveraging a newly defined notion of homoscedasticity, we develop a conformal prediction algorithm that offers finite-sample…
Handling missing values is a common challenge in biostatistical analyses, typically addressed by imputation methods. We propose a novel, fast, and easy-to-use imputation method called missing value imputation with adversarial random forests…
The Design of Experiments (DOEs) is a fundamental scientific methodology that provides researchers with systematic principles and techniques to enhance the validity, reliability, and efficiency of experimental outcomes. In this study, we…
Group or cluster structure on explanatory variables in machine learning problems is a very general phenomenon, which has attracted broad interest from practitioners and theoreticians alike. In this work we contribute an approach to sparse…
Why do reinforcement learning (RL) policies fail or succeed? This is a challenging question due to the complex, high-dimensional nature of agent-environment interactions. In this work, we take a causal perspective on explaining the behavior…
Machine learning (ML) surrogate models are increasingly used in engineering analysis and design to replace computationally expensive simulation models, significantly reducing computational cost and accelerating decision-making processes.…