Statistics
Bayesian inference provides principled uncertainty quantification but is often limited by the challenges of prior and likelihood elicitation. The martingale posterior (MGP) (Fong et al., 2023) offers an alternative by replacing these…
We study the decoupled multi-armed bandit problem, where the learner separately selects one arm for exploration and one, possibly different, arm for exploitation at each round. In this setting, the loss of the explored arm is observed but…
Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs either to a predictor or to external experts. While promising, L2D is highly vulnerable to adversarial perturbations, which can not only flip predictions but also…
Generative models frequently suffer miscalibration, wherein statistics of the sampling distribution, such as the fraction of generations in a given class, deviate from desired values. We frame calibration as a constrained optimization…
Diffusion models are central to generative modeling and have been adapted to graphs by diffusing adjacency matrix representations. The challenge of having up to $n!$ such representations for graphs with $n$ nodes is only partially mitigated…
Conformal prediction provides distribution-free prediction sets with finite-sample conditional guarantees. We build upon the RKHS-based framework of Gibbs et al. (2023), which leverages families of covariate shifts to provide approximate…
Sequential Bayesian experimental design typically assumes that the number of experiments is fixed before data collection begins. In practical campaigns, however, experimentation may need to terminate early because additional measurements…
Semi-supervised learning (SSL) arises in practice when labeled data are scarce or expensive to obtain, while large quantities of unlabeled data are readily available. With the growing adoption of machine learning techniques, it has become…
We develop a new classification framework based on the theory of coherent risk measures and systemic risk. The proposed approach is suitable for multi-class problems when the data is noisy, scarce (relative to the dimension of the problem),…
We study local linear convergence of gradient descent for finite-width feedforward networks under the squared empirical loss. Prior work shows that GD can remain confined to a Locally Quasi-Convex Region (LQCR) around initialization, but…
The current state of evaluation in survival analysis is plagued by the persistent use of evaluation metrics in ways that are misaligned with the stated modeling objective. In addition, many such evaluations are based on censoring…
Boundary Discontinuity (BD) designs are used in empirical research to learn about causal treatment effects along a continuous assignment boundary defined by a bivariate score. These designs are also known as multi-score regression…
This paper presents a novel approach to classical linear regression, enabling model computation from data streams or in a distributed setting while preserving data privacy in federated environments. We extend this framework to generalized…
Structural and practical parameter non-identifiability issues are common when mathematical models are used to interpret data. Such issues motivate model reparameterisation and reduction methods. Here, we consider Invariant Image…
Differential privacy (DP) provides robust privacy guarantees for statistical inference, but this can lead to unreliable results and biases in downstream applications. While several noise-aware approaches have been proposed which integrate…
Principal component analysis (PCA) is a fundamental tool for analyzing multivariate data. Here the focus is on dimension reduction to the principal subspace, characterized by its projection matrix. The classical principal subspace can be…
We explore methods to reduce the impact of unobserved confounders on the causal mediation analysis of high-dimensional mediators with spatially smooth structures, such as brain imaging data. The key approach is to incorporate the latent…
Ecological and conservation studies monitoring bird communities typically rely on species classification based on bird vocalizations. Historically, this has been based on expert volunteers going into the field and making lists of the bird…
This paper studies parametric bootstrap methods for network data, with the goal of quantifying the uncertainty of network statistics of interest. While existing network resampling methods primarily focus on count statistics under…
Numerical simulators are widely used to model physical phenomena and global sensitivity analysis (GSA) aims at studying the global impact of the input uncertainties on the simulator output. To perform GSA, statistical tools based on…