Statistics
Deploying clinical prediction models across healthcare systems often fails when key training covariates are unavailable at deployment and labeled outcomes are limited in the target domain. For example, high-performing models for…
This article addresses issues of model criticism and model comparison in Bayesian contexts, and focusses on the use of the so-called posterior predictive p-values (ppp values). These involve a general discrepancy or conflict measure and…
Longitudinal modified treatment policies (LMTP) are a class of interventions that allow the definition, identification, and estimation of causal effects in general settings, such as with continuous or multivariate exposures, treatment…
We study the problem of identifying dynamically distinct basins of attraction in high dimensional time-homogeneous Markov processes using only trajectory sampling. This problem is fundamental in the analysis of metastable dynamical systems,…
Heritability is a central concept in the long-standing debate about nature versus nurture in biological and social sciences. However, existing notions of heritability are based on strong assumptions and do not use explicit causal models. We…
The regression of principal component scores (RPCS) on covariates is a widely used analytic approach to detect and test for associations between functional measurements and study participant characteristics. Here we show that: (1) RPCS…
Modern Artificial Intelligence achieves remarkable predictive power by optimizing statistical risk functionals over vast corpora. Yet a gap separates this from genuine intelligence: the inability to distinguish correlation from causation.…
Finite-width fully connected neural networks with Gaussian-initialized weights deviate from their infinite-width Gaussian limit, exhibiting non-vanishing higher-order cumulants. We approximate these deviations, for a neural network…
We propose a novel kinetic Langevin sampler based on a specific splitting scheme using the exact harmonic Langevin integrator. For strongly log-concave target measures, the sampler exploits a decomposition of the strongly convex potential…
Regularized Adjusted Plus-Minus (RAPM) is the standard framework for estimating individual player impact in basketball. Its application requires possession-level stint data -- records of which five players shared the court for each…
Human feedback often arrives as preferences rather than calibrated numeric rewards, motivating reinforcement learning from preferential feedback, also referred to as reinforcement learning from human feedback (RLHF). We present a rigorous…
Survival analysis aims to model how covariates and time jointly shape the time-to-event distribution under right censoring. Classical methods such as the Cox model and generalised additive models (GAMs) require interactions and time-varying…
We propose and analyze a conservative drifting method for one-step generative modeling. The method replaces the original displacement-based drifting velocity by a kernel density estimator (KDE)-gradient velocity, namely the difference of…
Uniform sampling on implicitly defined manifolds is a core primitive in motion planning, constrained simulation, and probabilistic machine learning. MASEM addresses this problem by entropy-maximizing resampling, but its resampling weights…
Diffusion/score-based models have emerged as powerful generative models, capable of generating high-quality samples that mimic the training data distribution. However, it has been observed that they are prone to reproducing training…
Data assimilation (DA) addresses the problem of sequentially estimating the state of a dynamical system from noisy and incomplete observations. In this work, we employ a diffusion model as a world model to simulate and predict the system's…
Fractional polynomials are widely used for dose-response modelling, and recent Bayesian fractional polynomial work has renewed interest in this finite model class. We propose PMM-FP, a frequentist extension of Kunchenko's polynomial…
For stochastic process models, parameter inference is often severely bottlenecked by computationally expensive likelihood functions. Simulation-based inference (SBI) bypasses this restriction by constructing amortized surrogate likelihoods,…
Conjoint experiments randomize multidimensional profiles, offering a powerful design for recovering structural preference parameters -- including marginal rates of substitution, willingness to pay, and the distribution of preferences across…
We develop methods for estimating how infinitesimal policy changes affect long-term outcomes in dynamic systems. We show that dynamic marginal policy effects (MPEs) can be identified via tractable reduced-form expressions, and can be…