统计方法学
Estimating individualized treatment rules (ITRs) is fundamental to precision medicine, where the goal is to tailor treatment decisions to individual patient characteristics. While numerous methods have been developed for ITR estimation,…
Estimating conditional average treatment effects (CATE) from randomized controlled trials (RCTs) and generalizing them to broader populations is essential for personalizing treatment rules but is complicated by selection bias due to trial…
Mixed-effects models are widely used to model data with hierarchical grouping structures and high-cardinality categorical predictor variables. However, for high-dimensional crossed random effects, current standard computations relying on…
We define generalized innovations associated with generalized error models having arbitrary distributions, that is, distributions that can be mixtures of continuous and discrete distributions. These models include stochastic volatility…
Building artificially intelligent geospatial systems requires rapid delivery of spatial data analysis on massive scales with minimal human intervention. Depending upon their intended use, data analysis can also involve model assessment and…
The test-negative design has become popular for evaluating the effectiveness of post-licensure vaccines using observational data. In addition to its logistical convenience on data collection, the design is also believed to control for the…
High-dimensional clustering often relies on geometric or local-similarity structure, but the dominant separation between groups may not always be location-based. Differences in dispersion can create asymmetric local-neighborhood patterns:…
Beta regression models are employed to model continuous response variables in the unit interval, like rates, percentages, or proportions. Their applications rise in several areas, such as medicine, environment research, finance, and natural…
The main purpose of this paper is to introduce a new class of regression models for bounded continuous data, commonly encountered in applied research. The models, named the power logit regression models, assume that the response variable…
This paper develops a macroscopic, activity-based model of urban active mobility using nonintrusive sensor data. It introduces attendance functions to describe spatio-temporal travel patterns between activities and formulates the…
With the growing application of spatial predictive modeling in ecology, the question of how to appropriately evaluate the resulting maps has gained increasing attention. While there is consensus that map accuracy is ideally estimated using…
Causal discovery methods aim to infer causal direction from observational data. Functional causal discovery approaches use structural asymmetries to identify causal directionality but rely on strong modeling assumptions and provide limited…
The increasing availability of experimental data has intensified interest in calibrating stochastic models, raising fundamental questions about parameter identifiability. Structural identifiability determines whether parameters can be…
A median-radius framework for assessing centrality in multivariate data using median distances is proposed. Based on the proposed framework, a scale invariant measure of radial dispersion is defined and used to establish a depth function…
Selection bias is pervasive in observational studies. For example, large scale biobanks data can exhibit ``healthy volunteer bias'' when respondents are healthier and of higher socio-economic status than the population they are meant to…
Many pre-trained models (PTMs) are available in modern applications. Because different PTMs are often trained on different datasets, their performances can vary substantially for different new tasks, and the ranking of the candidates may…
Inference for models with recursively defined likelihoods is computationally demanding, limiting scalability to large datasets. We propose a stabilised weighted subsampling methodology for accelerated inference based on an unbiased…
Applied researchers in biomedicine and related fields are often interested in estimating the causal effect of a treatment or intervention. Although randomized clinical trials are considered the gold standard for establishing causal effects,…
This note addresses a key limitation of the Folding Test of Unimodality (FTU). In specific univariate mixture settings, the folding-based criterion can systematically fail, misclassifying clearly multimodal distributions as unimodal. We…
This paper investigates the predictive performance of model averaging in high-dimensional linear regression where the number of regressors is comparable to the sample size. We demonstrate that the double descent trajectory manifests within…