统计方法学
Flexible random scale-mixture models provide a framework for capturing a broad range of extremal dependence structures. However, likelihood-based inference under the peaks-over-threshold setting is often computationally infeasible, due to…
Model-assisted regression estimation is fundamental in survey sampling for incorporating auxiliary information. However, when the auxiliary dimension grows with the sample size, the standard Generalized regression (GREG) estimator can…
Supervised machine learning assumes that labeled data provide accurate measurements of the concepts models are meant to learn. Yet in practice, human labeling introduces systematic variation arising from ambiguous items, divergent…
This paper develops a copula-based time-series framework for modelling sovereign credit rating activity and its dependence dynamics, with extensions incorporating climate risk. We introduce a mixed-difference transformation that maps…
In Mendelian randomization (MR) studies, genetic variants are used as instrumental variables (IVs) to investigate causal relationships between exposures and outcomes based on observational data. However, numerous genetic studies have shown…
Estimation of covariance matrices is a fundamental problem in multivariate statistics. Recently, growing efforts have focused on incorporating covariate effects into these matrices, facilitating subject-specific estimation. Despite these…
Rerandomization is an experimental design technique that repeatedly randomizes treatment assignments until covariates are balanced between treatment groups. Rerandomization in the design stage of an experiment can lead to many asymptotic…
Estimating covariance parameters for multivariate spatial Gaussian random fields is computationally challenging, as the number of parameters grows rapidly with the number of variables, and likelihood evaluation requires operations of order…
Spatial time series (STS) data are fundamental to climate science, yet conventional approaches often conflate temporal co-evolution with genuine spatial dependence, obscuring subtle but critical climatic anomalies. We introduce a Random…
High-dimensional variable selection, particularly in genomics, requires error-controlling procedures that scale to millions of predictors. The Terminating-Random Experiments (T-Rex) selector achieves false discovery rate (FDR) control by…
We introduce Poisson-response tensor-on-tensor regression (PToTR), a novel regression framework designed to handle tensor responses composed element-wise of random Poisson-distributed counts. Tensors, or multi-dimensional arrays, composed…
Compositional time series frequently exhibit structural breaks due to external shocks, policy changes, or market disruptions. Standard methods either ignore such breaks or handle them through fixed effects that cannot extrapolate beyond the…
Count-compositional data arise in many different fields, including high-throughput sequencing experiments, ecological surveys, and palaeoclimate studies, where a common, important goal is to understand how covariates relate to the observed…
Discrete random probability measures are central to Bayesian inference, particularly as priors for mixture modeling and clustering. A broad and unifying class is that of proper species sampling processes (SSPs), encompassing many Bayesian…
Assessing whether two patient populations exhibit comparable event dynamics is essential for evaluating treatment equivalence, pooling data across cohorts, or comparing clinical pathways across hospitals or strategies. We introduce a…
This paper tackles the challenge of performing multiple quantile regressions across different quantile levels and the associated problem of controlling the familywise error rate, an issue that is generally overlooked in practice. We propose…
In randomized experiments, covariates are often used to reduce variance and improve the precision of treatment effect estimates. However, in many real-world settings, interference between units, where one unit's treatment affects another's…
We design a debiased parametric bootstrap framework for statistical inference from differentially private data. Existing usage of the parametric bootstrap on privatized data ignored or avoided handling possible biases introduced by the…
Large language models (LLMs) are increasingly used to generate labels from radiology reports to enable large-scale AI evaluation. However, label noise from LLMs can introduce bias into performance estimates, especially under varying disease…
In test equating, ensuring score comparability across different test forms is crucial but particularly challenging when test groups are non-equivalent and no anchor test is available. Local test equating aims to satisfy Lord's equity…