Statistics
We study denoising score matching (DSM) when the latent distribution is supported on a smooth embedded manifold $M \subset \mathbb{R}^D$. Under ambient Gaussian corruption, the tangent denoising target contains a singular normal-fiber noise…
Sequential change-point detection in non-Gaussian stochastic processes is challenging because the underlying densities are rarely known in real time. Classical parametric procedures such as CUSUM lose optimality under distributional…
Training a language model on data scattered across bandwidth-limited nodes that cannot be centralized is a setting that arises in clinical networks, enterprise knowledge bases, and scientific consortia. We study the regime in which data…
In this paper, we propose an invariant quantile regression (IQR) framework specifically designed for multi-environment datasets, which captures the invariance across different environments. This framework is closely related to transfer…
We argue that formal certification of AI alignment over open-ended or unbounded input domains is impossible under standard assumptions in computational complexity and learning theory, and characterise what remains achievable. Two…
While the point-centred quarter method (PCQM) is widely used for density estimation, existing methods for handling right-censored data from truncated search radii rely primarily on a Poisson model assuming complete spatial randomness (CSR),…
Heteroscedasticity -- where the variance of a variable changes with other variables -- is pervasive in real data, and elucidating why it arises from the perspective of statistical moments is crucial in scientific knowledge discovery and…
Temperature scaling is a simple method that allows to control the uncertainty of probabilistic models. It is mostly used in two contexts: improving the calibration of classifiers and tuning the stochasticity of large language models (LLMs).…
Discrete flow models (DFMs) have been proposed to learn the data distribution on finite state space, offering a flexible framework as an alternative to discrete diffusion models. A line of recent work has studied samplers for discrete…
Lineage marker population frequencies can serve as one way to express evidential value in forensic genetics. However, for high-quality whole mitochondrial DNA genome sequences (mitogenomes), population data remain limited. In this paper, we…
Overbounds of heavy-tailed measurement errors are essential to meet stringent navigation requirements in integrity monitoring applications. This paper proposes to leverage the bounding sharpness of the Cauchy distribution in the core and…
Data assimilation (DA) is a cornerstone of scientific and engineering applications, combining model forecasts with sparse and noisy observations to estimate latent system states. Classical high-dimensional DA methods, such as the ensemble…
Causal representation learning (CRL) has garnered increasing interest from the causal inference and artificial intelligence communities due to its potential to disentangle complex data-generating mechanism into causally interpretable latent…
Double robustness is a major selling point of semiparametric and missing data methodology. Its virtues lie in protection against partial nuisance misspecification and asymptotic semiparametric efficiency under correct nuisance…
Bayesian optimal experimental design (OED) provides a principled framework for selecting observations or experiments. We introduce new Bayesian design criteria based on the expected Wasserstein-$p$ distance between the prior and posterior…
High-dimensional categorical data arise in diverse scientific domains and are often accompanied by covariates. Latent class regression models are routinely used in such settings, reducing dimensionality by assuming conditional independence…
Bayesian nonparametric methods are naturally suited to the problem of out-of-distribution (OOD) detection. However, these techniques have largely been eschewed in favor of simpler methods based on distances between pre-trained or learned…
Isometry pursuit is a convex algorithm for identifying orthonormal column-submatrices of wide matrices. It consists of a novel normalization method followed by multitask basis pursuit. Applied to Jacobians of putative coordinate functions,…
We consider conformal prediction for multivariate data and focus on hierarchical data, where some components are linear combinations of others. Intuitively, the hierarchical structure can be leveraged to reduce the size of prediction…
Several variational bounds involving importance weighting ideas generalize the Evidence Lower BOund (ELBO) for marginal likelihood optimization, such as the Importance-weighted Auto-Encoder (IWAE), Variational R\'enyi (VR) and VR-IWAE…