统计方法学
Increasing evidence suggests that variability in longitudinal biomarkers, in addition to their mean trajectory, carries prognostic information for time-to-event outcomes. However, standard joint models typically capture only the expected…
In placebo-controlled randomized trials, the post-randomization use of concomitant medications may be higher in the placebo arm than in the treatment arm. This may dilute the full benefits of the randomized drug as estimated by the…
We propose a framework for determining whether the causal dependence of an outcome $Y$ on a covariate $X$ changes at a given time point, given confounders $\boldsymbol{Z}$. For instance, in financial markets, the effect of a market…
Double machine learning (DML) delivers valid inference on low-dimensional causal parameters while permitting flexible nuisance estimation, but its computational cost becomes prohibitive once cross-fitted learners must be trained on massive…
The role of AI-generated synthetic data has recently been expanded to support realistic Monte Carlo simulations. However, guidance is limited on generating data with multilevel structures and designing simulations based on such data. This…
The discrete Pareto (or Zeta, Zipf) distribution, arises naturally in modeling rank-frequency data across diverse fields such as linguistics, demography, biology, and computer science. Despite its widespread applicability, goodness-of-fit…
Differential item functioning (DIF) arises alongside latent population heterogeneity in many applications, and both must be accounted for when assessing measurement invariance. In many practical settings, however, the comparison groups are…
Machine-learning systems used in survey-based social measurement require uncertainty estimates that are reliable across population subgroups, not merely valid in aggregate. We study ordinal conformal prediction for five-level AI-attitude…
Sparse regression based on global-local shrinkage priors are increasingly used for Bayesian modeling of modern high-dimensional data, but scaling up the Gibbs sampler for posterior inference remains a challenge. While much effort has gone…
We formally introduce a class of models inspired by renormalization group (RG) theory, built on additive hierarchical expansions analogous to those appearing in functional ANOVA and mixed-effects models. Like ReLU convolutional neural…
High-dimensional functional data are becoming increasingly common in fields such as environmental monitoring and neuroimaging. This paper studies high-dimensional functional linear regression models that relate a scalar response to…
In large observational studies, the case-cohort design is commonly used to reduce the cost associated with covariate measurement. For survival outcomes, literature has suggested that the restricted mean survival time (RMST) be a more…
High-dimensional spatially correlated covariates are common in regression models encountered in environmental sciences and other fields. In such models, the regression coefficients often exhibit a sparse structure with spatial dependence.…
Covariance matrix outcomes arise naturally in neuroimaging experiments to study brain functional connectivity. It is also of interest to understand how brain network organization varies with subject-level covariates. Existing covariance…
Advances in sensing technology have made it possible to collect large volumes of high-dimensional time-series data. In fields like genetics and neuroscience, key questions concern whether directed relationships between variables can be…
Pairwise comparisons from multiple judges are central to large language model evaluation and preference modeling, yet standard ranking pipelines often pool judgments into a single score vector, treating systematic judge disagreement as…
Model selection plays an important role in longitudinal data analysis, especially when models are estimated using the generalized method of moments (GMM) in the presence of time-dependent covariates. In this setting, the number of valid…
Compositional data, which are vectors of proportions constrained to the probability simplex, arise frequently in modern scientific applications, including microbiome relative abundances across body sites and cell-type mixture weights…
Log-logistic distribution is a flexible distribution that can model a wide range of failure patterns in the field of electrical, electronic and mechanical engineering and is often used in reliability inference. However, the inference of the…
We propose the covariate-balanced-and-adjusted response-adaptive randomization (CBARA) procedure for adaptive design in clinical trials, which integrates the complementary strengths of covariate-adjusted response-adaptive randomization…