Two-Sample Testing in High-Dimensional Models

Nicolas Städler; Sach Mukherjee

Two-Sample Testing in High-Dimensional Models

Methodology 2013-01-17 v2

Authors: Nicolas Städler , Sach Mukherjee

Abstract

We propose novel methodology for testing equality of model parameters between two high-dimensional populations. The technique is very general and applicable to a wide range of models. The method is based on sample splitting: the data is split into two parts; on the first part we reduce the dimensionality of the model to a manageable size; on the second part we perform significance testing (p-value calculation) based on a restricted likelihood ratio statistic. Assuming that both populations arise from the same distribution, we show that the restricted likelihood ratio statistic is asymptotically distributed as a weighted sum of chi-squares with weights which can be efficiently estimated from the data. In high-dimensional problems, a single data split can result in a "p-value lottery". To ameliorate this effect, we iterate the splitting process and aggregate the resulting p-values. This multi-split approach provides improved p-values. We illustrate the use of our general approach in two-sample comparisons of high-dimensional regression models ("differential regression") and graphical models ("differential network"). In both cases we show results on simulated data as well as real data from recent, high-throughput cancer studies.

Keywords

hypothesis testing survey sampling high-dimensional regression

Cite

@article{arxiv.1210.4584,
  title  = {Two-Sample Testing in High-Dimensional Models},
  author = {Nicolas Städler and Sach Mukherjee},
  journal= {arXiv preprint arXiv:1210.4584},
  year   = {2013}
}

Comments

28 pages, 12 figures

Two-Sample Testing in High-Dimensional Models

Abstract

Keywords

Cite

Comments

Related papers