机器学习
Graph Neural Networks (GNNs) have achieved impressive performance in graph-related tasks. However, they suffer from poor generalization on out-of-distribution (OOD) data, as they tend to learn spurious correlations. Such correlations…
There remain theoretical gaps in deep neural network estimators for the nonparametric Cox proportional hazards model. In particular, it is unclear how gradient-based optimization error propagates to population risk under partial likelihood,…
Many scientific systems, such as cellular populations or economic cohorts, are naturally described by probability distributions that evolve over time. Predicting how such a system would have evolved under different forces or initial…
In clustering, strong dominance in the size of a particular cluster is often undesirable, motivating a measure of cluster size uniformity that can be used to filter such partitions. A basic requirement of such a measure is stability:…
Four distinct admissibility geometries govern sequential and distribution-free inference: Blackwell risk dominance over convex risk sets, anytime-valid admissibility within the nonnegative supermartingale cone, marginal coverage validity…
This paper argues that DNNs implement a computational Occam's razor -- finding the `simplest' algorithm that fits the data -- and that this could explain their incredible and wide-ranging success over more traditional statistical methods.…
Markov Chain Monte Carlo (MCMC) algorithms are essential tools in computational statistics for sampling from unnormalised probability distributions, but can be fragile when targeting high-dimensional, multimodal, or complex target…
The field of algorithms with predictions incorporates machine learning advice in the design of online algorithms to improve real-world performance. A central consideration is the extent to which predictions can be trusted -- while existing…
We study Leaky ResNets, which interpolate between ResNets and Fully-Connected nets depending on an 'effective depth' hyper-parameter $\tilde{L}$. In the infinite depth limit, we study 'representation geodesics' $A_{p}$: continuous paths in…
We investigate contextual graph matching in the Gaussian setting, where both edge weights and node features are correlated across two networks. We derive precise information-theoretic thresholds for exact recovery, and identify conditions…
Standard conformal anomaly detection provides marginal finite-sample guarantees under the assumption of exchangeability . However, real-world data often exhibit distribution shifts, necessitating a weighted conformal approach to adapt to…
Characteristic functions of weighted sums of independent random variables exhibit low-rank structure in the quantized tensor train (QTT) representation, also known as matrix product states (MPS), enabling up to exponential compression of…
We propose stepwise variational inference (VI) with vine copulas: a universal VI procedure that combines vine copulas with a novel stepwise estimation procedure of the variational parameters. Vine copulas consist of a nested sequence of…
Active learning reduces labeling costs by selecting samples that maximize information gain. A dominant framework, Query-by-Committee (QBC), typically relies on perturbation-based diversity by inducing model disagreement through random…
We consider a PAC-Bayes type learning rule for binary classification, balancing the training error of a randomized ''posterior'' predictor with its KL divergence to a pre-specified ''prior''. This can be seen as an extension of a modified…
Preference-based fine-tuning has become an important component in training large language models, and the data used at this stage may contain sensitive user information. A central question is how to design a differentially private pipeline…
We derive posterior contraction rates (PCRs) and finite-sample Bernstein von Mises (BvM) results for non-parametric Bayesian models by extending the diffusion-based framework of Mou et al. (2024) to the infinite-dimensional setting. The…
Knowledge distillation has emerged as a powerful technique for compressing large language models (LLMs) into efficient, deployable architectures while preserving their advanced capabilities. Recent advances in low-rank knowledge…
Social insect colonies and ensemble machine learning methods represent two of the most successful examples of decentralized information processing in nature and computation respectively. Here we develop a rigorous mathematical framework…
Model-based design of experiments (MBDOE) is essential for efficient parameter estimation in nonlinear dynamical systems. However, conventional adaptive MBDOE requires costly posterior inference and design optimization between each…