Related papers: Efficient Case-Cohort Design using Balanced Sampli…

Improving estimation efficiency of case-cohort study with interval-censored failure time data

The case-cohort design is a commonly used cost-effective sampling strategy for large cohort studies, where some covariates are expensive to measure or obtain. In this paper, we consider regression analysis under a case-cohort study with…

Methodology · Statistics 2023-10-24 Qingning Zhou , Kin Yau Wong

CaseCohortCoxSurvival: an R Package for Case-Cohort Inference for Relative Hazard and Pure Risk under the Cox Model

The case-cohort design allows analysis of multiple endpoints and only requires covariates to be measured for cases and non-cases in a random subcohort from the cohort. Stratification of subcohort sampling and weight calibration increase…

Applications · Statistics 2024-02-15 Lola Etievant , Mitchell H. Gail

Optimal subsampling for the Cox proportional hazards model with massive survival data

The use of massive survival data has become common in survival analysis. In this study, a subsampling algorithm is proposed for the Cox proportional hazards model with time-dependent covariates when the sample is extraordinarily large but…

Computation · Statistics 2023-02-07 Nan Qiao , Wangcheng Li , Feng Xiao , Cunjie Lin , Yong Zhou

Mastering Rare Event Analysis: Optimal Subsample Size in Logistic and Cox Regressions

In the realm of contemporary data analysis, the use of massive datasets has taken on heightened significance, albeit often entailing considerable demands on computational time and memory. While a multitude of existing works offer optimal…

Methodology · Statistics 2024-06-21 Tal Agassi , Nir Keret , Malka Gorfine

A general semiparametric Z-estimation approach for case-cohort studies

Case-cohort design, an outcome-dependent sampling design for censored survival data, is increasingly used in biomedical research. The development of asymptotic theory for a case-cohort design in the current literature primarily relies on…

Statistics Theory · Mathematics 2012-04-13 Bin Nan , Jon A. Wellner

Combining heterogeneous subgroups with graph-structured variable selection priors for Cox regression

Important objectives in cancer research are the prediction of a patient's risk based on molecular measurements such as gene expression data and the identification of new prognostic biomarkers (e.g. genes). In clinical practice, this is…

Applications · Statistics 2020-04-17 Katrin Madjar , Manuela Zucknick , Katja Ickstadt , Jörg Rahnenführer

Cox model inference for relative hazard and pure risk from stratified weight-calibrated case-cohort data

The case-cohort design obtains complete covariate data only on cases and on a random sample (the subcohort) of the entire cohort. Subsequent publications described the use of stratification and weight calibration to increase efficiency of…

Methodology · Statistics 2023-04-10 Lola Etievant , Mitchell H. Gail

Moment-assisted subsampling method for Cox proportional hazards model with large-scale data

The Cox proportional hazards model is widely used in survival analysis to model time-to-event data. However, it faces significant computational challenges in the era of large-scale data, particularly when dealing with time-dependent…

Methodology · Statistics 2025-01-14 Miaomiao Su , Ruoyu Wang

Balanced Subsampling for Big Data with Categorical Covariates

Supervised learning under measurement constraints is a common challenge in statistical and machine learning. In many applications, despite extensive design points, acquiring responses for all points is often impractical due to resource…

Methodology · Statistics 2025-03-19 Lin Wang

Using Model-Assisted Calibration Methods to Improve Efficiency of Regression Analyses with Two-Phase Samples under Complex Survey Designs

Two-phase sampling designs are frequently employed in epidemiological studies and large-scale health surveys. In such designs, certain variables are exclusively collected within a second-phase random subsample of the initial first-phase…

Methodology · Statistics 2024-03-25 Lingxiao Wang

Improving Survival Models in Healthcare by Balancing Imbalanced Cohorts: A Novel Approach

We explore whether survival model performance in underrepresented high- and low-risk subgroups - regions of the prognostic spectrum where clinical decisions are most consequential - can be improved through targeted restructuring of the…

Methodology · Statistics 2025-10-03 Catherine Ning , Dimitris Bertsimas , Johan Gagnière , Stefan Buettner , Per Eystein Loenning , Hideo Baba , Itaru Endo , Georgios Stasinos , Richard Burkhart , Federico N. Auecio , Felix Balzer , Cornelis Verhoef , Martin E. Kreis , Georgios Antonios Margonis

Optimal Cox Regression Subsampling Procedure with Rare Events

Massive sized survival datasets are becoming increasingly prevalent with the development of the healthcare industry. Such datasets pose computational challenges unprecedented in traditional survival analysis use-cases. A popular way for…

Methodology · Statistics 2023-05-09 Nir Keret , Malka Gorfine

Weighted Cox regression for the prediction of heterogeneous patient subgroups

An important task in clinical medicine is the construction of risk prediction models for specific subgroups of patients based on high-dimensional molecular measurements such as gene expression data. Major objectives in modeling…

Methodology · Statistics 2020-03-23 Katrin Madjar , Jörg Rahnenführer

A Bayesian framework for case-cohort Cox regression: application to dietary epidemiology

The case-cohort study design bypasses resource constraints by collecting certain expensive covariates for only a small subset of the full cohort. Weighted Cox regression is the most widely used approach for analysing case-cohort data within…

Methodology · Statistics 2021-09-10 Andrew Yiu , Robert J. B. Goudie , Stephen J. Sharp , Paul J. Newcombe , Brian D. M. Tom

Scalable and Efficient Multiple Imputation for Case-Cohort Studies via Influence Function-Based Supersampling

Two-phase sampling designs have been widely adopted in epidemiological studies to reduce costs when measuring certain biomarkers is prohibitively expensive. Under these designs, investigators commonly relate survival outcomes to risk…

Methodology · Statistics 2025-12-12 Jooho Kim , Yei Eun Shin

A maximin optimal approach for sampling designs in two-phase studies

Data collection costs can vary widely across variables in data science tasks. Two-phase designs can be employed to save data collection costs. This paper considers the two-phase studies where inexpensive variables are collected for all…

Methodology · Statistics 2025-12-04 Ruoyu Wang , Qihua Wang , Wang Miao

Cross-Balancing for Data-Informed Design and Efficient Analysis of Observational Studies

Causal inference starts with a simple idea: compare groups that differ by treatment, not much else. Traditionally, similar groups are constructed using only observed covariates; however, it remains a long-standing challenge to incorporate…

Methodology · Statistics 2025-11-21 Ying Jin , José Zubizarreta

Local case-control sampling: Efficient subsampling in imbalanced data sets

For classification problems with significant class imbalance, subsampling can reduce computational costs at the price of inflated variance in estimating model parameters. We propose a method for subsampling efficiently for logistic…

Computation · Statistics 2014-09-24 William Fithian , Trevor Hastie

Distributionally balanced sampling designs

We propose Distributionally Balanced Designs (DBD), a new class of probability sampling designs that target representativeness at the level of the full auxiliary distribution rather than selected moments. In disciplines such as ecology,…

Methodology · Statistics 2026-03-13 Anton Grafström , Wilmer Prentius

Enhanced Cube Implementation For Highly Stratified Population

A balanced sampling design should always be the adopted strategies if auxiliary information is available. Besides, integrating a stratified structure of the population in the sampling process can considerably reduce the variance of the…

Methodology · Statistics 2022-06-03 Raphaël Jauslin , Esther Eustache , Yves Tillé