统计方法学
High-dimensional compositional covariates, often derived from count data, are subject to measurement error and are frequently analyzed after aggregation along a prespecified tree to improve interpretability in applications such as…
Quantile regression extends regression analysis beyond the conditional mean, providing a richer characterization of covariate effects across the outcome distribution. For sensitive binary outcomes, however, misclassification due to…
The average treatment effect can obscure important heterogeneity when individuals respond differently to a treatment. While the conditional average treatment effect (CATE) function captures such heterogeneity, it is difficult to communicate…
Interval-censored data arise frequently in scientific studies, where the event of interest is known only to occur within a specific time interval. In such studies, functional covariates taking the form of continuous curves or spatial…
Income inequality is a major contributor to health disparities, yet its effects often vary by geography and are commonly represented as compositional distributions (e.g., proportions of households across income brackets). Existing spatial…
Sequential hypothesis tests are widely adopted as a principled way to perform multiple tests on data that arrives over time. In particular, researchers frequently utilize group sequential hypothesis tests (GST) to test the same hypotheses…
Since the introduction of network psychometrics, several connections to statistical models in "classical" psychometrics (i.e., IRT, SEM, GLM) as well as to approaches from other research fields have been established. In this paper, these…
Bipartite networks, which encode interactions between two distinct types of entities, arise widely in applications and exhibit inherent asymmetry across node sets. Despite a growing literature on bipartite community detection, estimating…
Non-negative matrix factorization (NMF) approximates a non-negative endogenous data matrix as $Y_1 \approx XB$, with non-negative latent components $X$ and coefficients $B$. Standard covariate-aware NMF is feedforward: $B$ depends only on…
An ongoing challenge in animal ecology is developing movement models that account for the autocorrelation, and often temporal irregularity, in telemetry data. Continuous-time Langevin diffusion models have been proposed to model temporally…
Hybrid type 2 studies are gaining popularity for their ability to assess both implementation and health outcomes as co-primary endpoints. Often conducted as cluster-randomized trials (CRTs), five design methods can validly power these…
We study a minimal change to an observation-driven Bayesian Dirichlet ARMA (B--DARMA) for compositional time series: replace the raw additive log-ratio (ALR) residual in the moving-average block with a centered innovation that subtracts the…
We present a novel framework (TS+TT) to nest a Target Study (TS) within a Target Trial (TT) for evaluating the effects of interventions on disparities. The TS component grounds the measurement of disparity in ethical assumptions, based on…
In Generalised Bayesian Inference (GBI), the learning rate and hyperparameters of the loss must be estimated. These inference-hyperparameters can't be estimated jointly with the other parameters, from the data, by giving them a prior.…
This paper develops a performant Bayesian approach to conditional average treatment effect (CATE) estimation in regression discontinuity designs (RDD), an increasingly prevalent form of quasi-experiment that facilitates causal inference.…
Rerandomization is an effective treatment allocation procedure to control for baseline covariate imbalance. For estimating the average treatment effect, rerandomization has been previously shown to improve the precision of the unadjusted…
In bipartite causal inference with interference, interventional units might receive treatment or control, and they might affect the outcome of outcome units through their connections on a bipartite network. We study bipartite causal…
We consider the problem of boundary detection for areal data, focusing on situations where for each areal unit multiple observations are available. We propose a Bayesian nonparametric mixture model for the area-specific population…
Despite advances in representation learning, high-dimensional classification remains challenging in low-sample-size regimes, where the dominant signal may vary across applications and labeled data are often limited. We propose a…
In the eleventh and twelfth centuries in England, Wales and Normandy, Royal Acta were legal documents in which witnesses were listed in order of social status. Any bishops present were listed as a group. For our purposes, each witness-list…