Statistics
Conformal prediction is a framework for providing prediction intervals with distribution-free validity, guaranteeing predictive coverage for data drawn from any distribution. Its two main variants are full conformal prediction and split…
Marine corrosion significantly reduces a ship's availability, increases costs of operation and could impact safety. Protective coatings mitigate these risks, but their effectiveness deteriorates over time. Early detection of coating…
Storage tanks for hazardous liquids are common in industry and agriculture. During a pollution incident, liquid may drain from a storage tank through a small hole, crack, or pipe. After containing the leak, estimating the discharged volume…
Response times collected in computerised assessments provide information about the underlying response process and may exhibit within-person variation over the course of a test. We propose a latent variable model for log response times that…
Spatial individual-level models (ILMs) provide a flexible framework for modelling infectious disease transmission across populations with known locations. Bayesian inference for these models relies on Markov chain Monte Carlo (MCMC), which…
Federated Conformal RAG (FC-RAG) provides distribution-free coverage for a bandwidth-limited swarm of weak language models, but only at a fixed horizon. We extend it to anytime-valid sequential coverage: validity at every stopping time,…
Generalized additive index models (GAIMs) offer a flexible semiparametric framework for capturing complex data relationships, balancing the interpretability of parametric models with the flexibility of nonparametric approaches. However,…
When surveillance data of infectious disease incidence (e.g. weekly case counts) are disaggregated by demographic indicators, disparities in long-run health outcomes between these groups become apparent. Accurate identification of high-risk…
Existing theory of momentum assumes that gradients arrive at every parameter at a roughly constant rate, an assumption violated in practice by heavy-tailed data distributions and modern architectures. We theoretically analyze the dynamics…
We study inference in stochastic block models (SBMs) through the lens of optimal transport (OT). We first establish that maximum likelihood variational inference (MLVI) can be interpreted as a semi-relaxed Gromov-Wasserstein (srGW)…
We study causal inference for time-to-event outcomes under right censoring in the presence of unmeasured confounding. Focusing on structural accelerated failure time models, we develop an identification and inference framework that exploits…
Traditional insurance pricing relies on risk-based principles that ensure actuarial fairness and solvency but do not explicitly account for policyholders' price sensitivity. We formulate insurance pricing as a decision-making problem and…
We introduce Triangular-Reference Schr\"odinger Bridges for Time Series (TR-SBTS), a conservative extension of the SBTS framework in which the Brownian reference is replaced by an intervalwise frozen, possibly degenerate diffusion…
Estimating how an outcome responds to a continuous treatment (the Average Dose-Response Function, or ADRF) is a core causal-inference primitive. However, when outcomes possess heavy tails, standard robust double machine learning (DML)…
Difference-in-differences (DiD) is a cornerstone of causal inference, yet extending it to functional outcomes is not a routine scalar generalization; rather, it entails three fundamental challenges in identification, inference, and…
In regression problems where covariates are naturally organized in a hierarchical tree structure, a central challenge is to select the resolution at which covariates enter the model. Determining this level of feature aggregation is of…
Low-dimensional embeddings are widely used as visual summaries of high-dimensional data and to enable downstream scientific discoveries. Yet, popular nonlinear dimension reduction methods, such as t-SNE and UMAP, are often selected based on…
Michaelis--Menten analysis is often conducted by nonlinear least squares under a constant-variance assumption, even though enzyme-kinetic data frequently display concentration-dependent heteroscedasticity and often include repeated or…
Learning-to-Defer (L2D) methods route each query either to a predictive model or to external experts. While existing work studies this problem in batch settings, real-world deployments require handling streaming data, changing expert…
Approximate Bayesian inference typically revolves around computing the posterior parameter distribution. In practice, however, the main object of interest is often a model's predictions rather than its parameters. In this work, we propose…