机器学习
While gradient-based discrete samplers are effective in sampling from complex distributions, they are susceptible to getting trapped in local minima, particularly in high-dimensional, multimodal discrete distributions, owing to the…
Neural networks are fragile when confronted with data that significantly deviates from their training distribution. This is true in particular for simulation-based inference methods, such as neural amortized Bayesian inference (ABI), where…
With the rapid growth in the number of Large Language Models (LLMs), there has been a recent interest in LLM routing, or directing queries to the cheapest LLM that can deliver a suitable response. We conduct a minimax analysis of the…
Adaptive bandwidth selection is a fundamental challenge in nonparametric regression. This paper introduces a new bandwidth selection procedure inspired by the optimality criteria for $\ell_0$-penalized regression. Although similar in spirit…
Sampling from high-dimensional distributions has wide applications in data science and machine learning but poses significant computational challenges. We introduce Subspace Langevin Monte Carlo (SLMC), a novel and efficient sampling method…
Many recent theoretical works on \emph{meta-learning} aim to achieve guarantees in leveraging similar representational structures from related tasks towards simplifying a target task. The main aim of theoretical guarantees on the subject is…
Independence testing is a classical statistical problem that has been extensively studied in the batch setting when one fixes the sample size before collecting data. However, practitioners often prefer procedures that adapt to the…
The quality of generalized linear models (GLMs), frequently used by insurance companies, depends on the choice of interacting variables. The search for interactions is time-consuming, especially for data sets with a large number of…
Importance sampling is a Monte Carlo technique for efficiently estimating the likelihood of rare events by biasing the sampling distribution towards the rare event of interest. By drawing weighted samples from a learned proposal…
Counterfactuals play a pivotal role in the two distinct data science fields of causal inference (CI) and explainable artificial intelligence (XAI). While the core idea behind counterfactuals remains the same in both fields--the examination…
This paper considers the estimation of quantiles via a smoothed version of the stochastic gradient descent (SGD) algorithm. By smoothing the score function in the conventional SGD quantile algorithm, we achieve monotonicity in the quantile…
Distributional robustness is a central goal of prediction algorithms due to the prevalent distribution shifts in real-world data. The prediction model aims to minimize the worst-case risk among a class of distributions, a.k.a., an…
Multi-modal contrastive learning as a self-supervised representation learning technique has achieved great success in foundation model training, such as CLIP~\citep{radford2021learning}. In this paper, we study the theoretical properties of…
Gaussian Process based Bayesian Optimization is a widely applied algorithm to learn and optimize under uncertainty, well-known for its sample efficiency. However, recently -- and more frequently -- research studies have empirically…
This paper introduces a novel nonparametric method for estimating high-dimensional dynamic covariance matrices with multiple conditioning covariates, leveraging random forests and supported by robust theoretical guarantees. Unlike…
Over the past decades, there has been a surge of interest in studying low-dimensional structures within high-dimensional data. Statistical factor models $-$ i.e., low-rank plus diagonal covariance structures $-$ offer a powerful framework…
We consider the problem of inferring the conditional independence graph (CIG) of high-dimensional Gaussian vectors from multi-attribute data. Most existing methods for graph estimation are based on single-attribute models where one…
Sequential Monte Carlo (SMC) methods offer a principled approach to Bayesian uncertainty quantification but are traditionally limited by the need for full-batch gradient evaluations. We introduce a scalable variant by incorporating…
The partitioning of data for estimation and calibration critically impacts the performance of propensity score based estimators like inverse probability weighting (IPW) and double/debiased machine learning (DML) frameworks. We extend recent…
In this work, we study the training and generalization performance of two-layer neural networks (NNs) after one gradient descent step under structured data modeled by Gaussian mixtures. While previous research has extensively analyzed this…