机器学习
The Gromov-Wasserstein (GW) variant of optimal transport, designed to compare probability densities defined over distinct metric spaces, has emerged as an important tool for the analysis of data with complex structure, such as ensembles of…
Given $n$ i.i.d. random matrices $A_i \in \mathbb{R}^{d \times d}$ that share a common expectation $\Sigma$, the objective of Differentially Private Stochastic PCA is to identify a subspace of dimension $k$ that captures the largest…
We establish non-asymptotic bounds on the finite-sample behavior of generalized first-order iterative algorithms -- including gradient-based optimization methods and approximate message passing (AMP) -- with Gaussian data matrices and…
The presence of missing values often reflects variations in data collection policies, which may shift across time or locations, even when the underlying feature distribution remains stable. Such shifts in the missingness distribution…
The Two-Stage Learning-to-Defer (L2D) framework has been extensively studied for classification and, more recently, regression tasks. However, many real-world applications require solving both tasks jointly in a multi-task setting. We…
This paper introduces a hypothesis space for deep learning based on deep neural networks (DNNs). By treating a DNN as a function of two variables - the input variable and the parameter variable - we consider the set of DNNs where the…
Finding the optimal solution is often the primary goal in combinatorial optimization (CO). However, real-world applications frequently require diverse solutions rather than a single optimum, particularly in two key scenarios. The first…
The interpretability of generative models is considered a key factor in demonstrating their effectiveness and controllability. The generated data are believed to be determined by latent variables that are not directly observable. Therefore,…
We present two analytical formulae for estimating the sensitivity -- namely, the gradient or Jacobian -- at given realizations of an arbitrary-dimensional random vector with respect to its distributional parameters. The first formula…
We propose a novel method, termed the M-learner, for estimating heterogeneous indirect and total treatment effects and identifying relevant subgroups within a mediation framework. The procedure comprises four key steps. First, we compute…
In this paper, we study a strictly stationary Markov chain gradient descent algorithm operating in general Hilbert spaces. Our analysis focuses on the mixing coefficients of the underlying process, specifically the $\phi$- and…
Multi-view data provides complementary information on the same set of observations, with multi-omics and multimodal sensor data being common examples. Analyzing such data typically requires distinguishing between shared (joint) and unique…
In order to sample from an unnormalized probability density function, we propose to combine continuous normalizing flows (CNFs) with rejection-resampling steps based on importance weights. We relate the iterative training of CNFs with…
Backpropagation (BP) of errors is the backbone training algorithm for artificial neural networks (ANNs). It updates network weights through gradient descent to minimize a loss function representing the mismatch between predictions and…
Recent advances in machine learning have greatly expanded the repertoire of predictive methods for medical imaging. However, the interpretability of complex models remains a challenge, which limits their utility in medical applications.…
The Bayesian conjugate gradient method offers probabilistic solutions to linear systems but suffers from poor calibration, limiting its utility in uncertainty quantification tasks. Recent approaches leveraging postiterations to construct…
We consider applying stochastic approximation (SA) methods to solve nonsmooth variational inclusion problems. Existing studies have shown that the averaged iterates of SA methods exhibit asymptotic normality, with an optimal limiting…
Quantile regression is a powerful tool for robust and heterogeneous learning that has seen applications in a diverse range of applied areas. However, its broader application is often hindered by the substantial computational demands arising…
Posterior sampling has been shown to be a powerful Bayesian approach for solving imaging inverse problems. The recent plug-and-play unadjusted Langevin algorithm (PnP-ULA) has emerged as a promising method for Monte Carlo sampling and…
Off-policy estimation (OPE) methods enable unbiased offline evaluation of recommender systems, directly estimating the online reward some target policy would have obtained, from offline data and with statistical guarantees. The theoretical…