Statistics
Diffusion models are central to generative modeling and have been adapted to graphs by diffusing adjacency matrix representations. The challenge of having up to $n!$ such representations for graphs with $n$ nodes is only partially mitigated…
Conformal prediction provides distribution-free prediction sets with finite-sample conditional guarantees. We build upon the RKHS-based framework of Gibbs et al. (2023), which leverages families of covariate shifts to provide approximate…
Sequential Bayesian experimental design typically assumes that the number of experiments is fixed before data collection begins. In practical campaigns, however, experimentation may need to terminate early because additional measurements…
Semi-supervised learning (SSL) arises in practice when labeled data are scarce or expensive to obtain, while large quantities of unlabeled data are readily available. With the growing adoption of machine learning techniques, it has become…
We develop a new classification framework based on the theory of coherent risk measures and systemic risk. The proposed approach is suitable for multi-class problems when the data is noisy, scarce (relative to the dimension of the problem),…
We study local linear convergence of gradient descent for finite-width feedforward networks under the squared empirical loss. Prior work shows that GD can remain confined to a Locally Quasi-Convex Region (LQCR) around initialization, but…
The current state of evaluation in survival analysis is plagued by the persistent use of evaluation metrics in ways that are misaligned with the stated modeling objective. In addition, many such evaluations are based on censoring…
Boundary Discontinuity (BD) designs are used in empirical research to learn about causal treatment effects along a continuous assignment boundary defined by a bivariate score. These designs are also known as multi-score regression…
This paper presents a novel approach to classical linear regression, enabling model computation from data streams or in a distributed setting while preserving data privacy in federated environments. We extend this framework to generalized…
Structural and practical parameter non-identifiability issues are common when mathematical models are used to interpret data. Such issues motivate model reparameterisation and reduction methods. Here, we consider Invariant Image…
Differential privacy (DP) provides robust privacy guarantees for statistical inference, but this can lead to unreliable results and biases in downstream applications. While several noise-aware approaches have been proposed which integrate…
Principal component analysis (PCA) is a fundamental tool for analyzing multivariate data. Here the focus is on dimension reduction to the principal subspace, characterized by its projection matrix. The classical principal subspace can be…
We explore methods to reduce the impact of unobserved confounders on the causal mediation analysis of high-dimensional mediators with spatially smooth structures, such as brain imaging data. The key approach is to incorporate the latent…
Ecological and conservation studies monitoring bird communities typically rely on species classification based on bird vocalizations. Historically, this has been based on expert volunteers going into the field and making lists of the bird…
This paper studies parametric bootstrap methods for network data, with the goal of quantifying the uncertainty of network statistics of interest. While existing network resampling methods primarily focus on count statistics under…
Numerical simulators are widely used to model physical phenomena and global sensitivity analysis (GSA) aims at studying the global impact of the input uncertainties on the simulator output. To perform GSA, statistical tools based on…
We develop Microcanonical Hamiltonian Monte Carlo (MCHMC), a class of models which follow a fixed energy Hamiltonian dynamics, in contrast to Hamiltonian Monte Carlo (HMC), which follows canonical distribution with different energy levels.…
Randomized controlled trials (RCTs) are the gold standard for evaluating causal effects but are often costly and difficult to scale; consequently, they are frequently augmented with auxiliary external controls in many applications. Prior…
Robustness of neural networks is commonly quantified via local or global Lipschitz constants. However, Lipschitz continuity can be overly coarse or overly restrictive as global robustness measure, failing to capture nuanced, data-dependent…
The e-value is gaining traction as a robust alternative to p-values and Bayes factors for quantifying statistical evidence. e-values are a promising method for adaptive clinical trials due to their anytime-validity: e-values ensure type I…