Statistics
Spatial variables can be observed in many different forms, such as regularly sampled random fields (lattice data), point processes, and randomly sampled spatial processes. Joint analysis of such collections of observations is clearly…
We consider an experimental design setting in which units are assigned to treatment after being sampled sequentially from an infinite population. We derive asymptotic efficiency bounds that apply to data from any experiment that assigns…
When a statistical model $\{P_{\theta} : \theta \in \Theta\}$ lacks analytically tractable likelihoods, parametric statistical inference based on data generated from an unknown underlying distribution $P$ can still be performed as long as…
Numerous studies have shown that microbial metabolites, which represent the products of bacteria in the human gut, play a key role in shaping cancer risk and response to treatment. However, metabolite data typically contain a large…
Causal mediation analysis decomposes the total treatment effect into a portion operating through a hypothesized mediator and a residual direct portion. Identification of natural direct and indirect effects typically rests on the mediator…
The Central Limit Theorem provides a foundation for inferential statistics and hypothesis testing. It describes how standardized statistics behave under repeated sampling from large populations. However, if the size of the sample (n)…
Federated Learning is a leading framework for training ML and AI models collaboratively across numerous user devices or databases. We study the trade-offs among estimation accuracy, privacy constraints, and communication cost for…
Inverse problems are ubiquitous in modern scientific studies and involve recovering an underlying signal from noisy observations often transformed by a measurement operator. These problems are frequently ill-posed, particularly in imaging,…
Directed Acyclic Graphs (DAGs) are central to uncovering causal structure in complex systems, yet learning a single DAG from data is often challenging: model uncertainty, finite samples, and a combinatorially large search space frequently…
Markov random fields are common prior distributions used in Bayesian inverse imaging problems. In particular, difference priors assign probability distributions to differences between neighbouring pixels, such as Gaussian, Laplace, or…
We present the Open-Source Sleep Monitor and Modulator (OSSMM), an open-source hardware and software platform for accessible sleep research. The OSSMM comprises a small wearable headband built from 3D prints and affordable…
Accurate estimates of item difficulty are essential for valid assessment and effective adaptive learning. However, for newly created tasks, response data are typically unavailable. Pretesting and expert judgement can be costly and slow,…
Coding and computation remain major bottlenecks in Markov chain Monte Carlo (MCMC) workflows, especially as modern sampling algorithms have become increasingly complex and existing probabilistic programming systems remain limited in model…
Generative models can produce nonsensical text, unrealistic images, and unstable materials faster than simulation or human review can absorb; without per-sample confidence, trust erodes. Existing fixes run $k$ ensembles or stochastic…
The functional ANOVA, or Hoeffding decomposition, provides a principled framework for interpretability by decomposing a model prediction into main effects and higher-order interactions. For independent inputs, this classical decomposition…
This paper proposes a novel framework for the approximation and analysis of circular density data using compositional periodic splines within Bayes spaces with the Hilbert space structure. By applying the centered log-ratio transformation,…
Champion recommendation in multiplayer online battle arena games is usually framed informally as a problem of metagame strength, personal comfort, or global win rate. We formalize champion recommendation in League of Legends as an…
Learning dynamical systems through operator-theoretic representations provides a powerful framework for analyzing complex dynamics, as spectral quantities such as eigenvalues and invariant structures encode characteristic time scales and…
The tree-structured varying coefficient (TSVC) model is a flexible approach for generalized regression, where the linear effects of the covariates are allowed to vary with the values of effect modifiers. Relevant effect modifiers and…
Discrete diffusion models are a powerful class of generative models with strong performance across many domains. For efficiency, however, discrete diffusion typically parameterizes the generative (reverse) process with factorized…