Related papers: Efficient Case-Cohort Design using Balanced Sampli…
The case-cohort design is a commonly used cost-effective sampling strategy for large cohort studies, where some covariates are expensive to measure or obtain. In this paper, we consider regression analysis under a case-cohort study with…
The case-cohort design allows analysis of multiple endpoints and only requires covariates to be measured for cases and non-cases in a random subcohort from the cohort. Stratification of subcohort sampling and weight calibration increase…
The use of massive survival data has become common in survival analysis. In this study, a subsampling algorithm is proposed for the Cox proportional hazards model with time-dependent covariates when the sample is extraordinarily large but…
In the realm of contemporary data analysis, the use of massive datasets has taken on heightened significance, albeit often entailing considerable demands on computational time and memory. While a multitude of existing works offer optimal…
Case-cohort design, an outcome-dependent sampling design for censored survival data, is increasingly used in biomedical research. The development of asymptotic theory for a case-cohort design in the current literature primarily relies on…
Important objectives in cancer research are the prediction of a patient's risk based on molecular measurements such as gene expression data and the identification of new prognostic biomarkers (e.g. genes). In clinical practice, this is…
The case-cohort design obtains complete covariate data only on cases and on a random sample (the subcohort) of the entire cohort. Subsequent publications described the use of stratification and weight calibration to increase efficiency of…
The Cox proportional hazards model is widely used in survival analysis to model time-to-event data. However, it faces significant computational challenges in the era of large-scale data, particularly when dealing with time-dependent…
Supervised learning under measurement constraints is a common challenge in statistical and machine learning. In many applications, despite extensive design points, acquiring responses for all points is often impractical due to resource…
Two-phase sampling designs are frequently employed in epidemiological studies and large-scale health surveys. In such designs, certain variables are exclusively collected within a second-phase random subsample of the initial first-phase…
We explore whether survival model performance in underrepresented high- and low-risk subgroups - regions of the prognostic spectrum where clinical decisions are most consequential - can be improved through targeted restructuring of the…
Massive sized survival datasets are becoming increasingly prevalent with the development of the healthcare industry. Such datasets pose computational challenges unprecedented in traditional survival analysis use-cases. A popular way for…
An important task in clinical medicine is the construction of risk prediction models for specific subgroups of patients based on high-dimensional molecular measurements such as gene expression data. Major objectives in modeling…
The case-cohort study design bypasses resource constraints by collecting certain expensive covariates for only a small subset of the full cohort. Weighted Cox regression is the most widely used approach for analysing case-cohort data within…
Two-phase sampling designs have been widely adopted in epidemiological studies to reduce costs when measuring certain biomarkers is prohibitively expensive. Under these designs, investigators commonly relate survival outcomes to risk…
Data collection costs can vary widely across variables in data science tasks. Two-phase designs can be employed to save data collection costs. This paper considers the two-phase studies where inexpensive variables are collected for all…
Causal inference starts with a simple idea: compare groups that differ by treatment, not much else. Traditionally, similar groups are constructed using only observed covariates; however, it remains a long-standing challenge to incorporate…
For classification problems with significant class imbalance, subsampling can reduce computational costs at the price of inflated variance in estimating model parameters. We propose a method for subsampling efficiently for logistic…
We propose Distributionally Balanced Designs (DBD), a new class of probability sampling designs that target representativeness at the level of the full auxiliary distribution rather than selected moments. In disciplines such as ecology,…
A balanced sampling design should always be the adopted strategies if auxiliary information is available. Besides, integrating a stratified structure of the population in the sampling process can considerably reduce the variance of the…