统计方法学
In many applications, researchers seek to identify overlapping entities across multiple data files. Record linkage algorithms facilitate this task, in the absence of unique identifiers. As these algorithms rely on semi-identifying…
We consider the problem of estimating fold-changes in the expected value of a multivariate outcome observed with unknown sample-specific and category-specific perturbations. This challenge arises in high-throughput sequencing studies of the…
ProfileGLMM is an R package integrating Generalised Linear Mixed Models (GLMMs) as the outcome model for Bayesian profile regression. This statistical framework simultaneously i) explains the variation in the outcome and ii) clusters the…
Randomized controlled trials (RCTs) are often underpowered to detect treatment heterogeneity in subgroups defined by cross-classifications of multiple covariates, due to sparse sample sizes in some strata. External RCT data can help, but…
Missing confounders are common in observational studies and present fundamental challenges for causal effect estimation by weakening identification and increasing sensitivity to model misspecification. Within the missing-indicator…
Overall survival (OS) is the gold standard for assessing patient benefit and cost-effectiveness of new cancer drugs. However, it is often difficult to use OS as the primary endpoint in randomized clinical trials (RCTs) for patients with…
The zero-inflated logistic regression model accommodates binary responses with excess zeros, which often arise from a latent mixture of susceptible and insusceptible subpopulations or asymmetric misclassification of the response. The model…
We present a general nonparametric approach for testing whether a statistical parameter defined through conditional distributions is constant across the conditioning variables. Such hypotheses arise naturally in problems such as assessing…
In many statistical applications, particularly in clinical studies, hypotheses may carry different levels of importance, motivating the use of weighted multiple testing procedures (wMTPs) to control the familywise error rate (FWER). Among…
Network meta-analysis of diagnostic test accuracy (NMA-DTA) is a relatively new field, involving combining evidence across studies to evaluate and compare the accuracy of different tests for a given condition. However, the methods proposed…
Learning about causal effects in target populations and their subsets may be facilitated by combining information from multiple sources. One major class of study designs that combine information involves appending an index study with data…
In many applications, the data lie on a type of cone, where there is a distinction between an overall scale variable and the remaining scale-free structure. For example, the joint size and shape of objects are points on a cone, where size…
When variable selection methods are applied to bootstrapped and multiply imputed datasets, the set of selected variables typically varies across iterations. Aggregating results via the union rule can lead to overly dense models. We propose…
In this paper, we propose a Bayesian matrix-variate spatiotemporal modeling framework for jointly analyzing multiple response variables observed at spatial locations over time. The approach relaxes the standard assumption of spatial…
Spatial generalized linear mixed-effects models are popularly used to analyze spatially indexed univariate responses. However, with modern technology, it is common to observe vector-valued mixed-type responses, e.g., a combination of…
Random survival forests are widely used for estimating covariate-conditional survival functions under right-censoring. Their standard log-rank splitting criterion is typically recomputed at each candidate split. This O(M) cost per split,…
The ratio of two densities provides a direct characterization of their differences. We consider the two-sample comparison problem by estimating this ratio given i.i.d. observations from two distributions. To this end, we propose additive…
Accurate transfer of information across multiple sectors to enhance model estimation is both significant and challenging in multi-sector portfolio optimization involving a large number of assets in different classes. Within the framework of…
Identifying spatially contiguous clusters and repeated spatial patterns (RSP) characterized by similar underlying distributions that are spatially apart is a key challenge in modern spatial statistics. Existing constrained clustering…
We propose a method that combines the closed testing framework with the concept of safe anytime-valid inference (SAVI) to compute lower confidence bounds for the true discovery proportion in a multiple testing setting. The proposed…