Inference on High-Dimensional Sparse Count Data

Jyotishka Datta; David B. Dunson

Inference on High-Dimensional Sparse Count Data

Methodology 2016-04-15 v2

Authors: Jyotishka Datta , David B. Dunson

Abstract

In a variety of application areas, there is a growing interest in analyzing high dimensional sparse count data, with sparsity exhibited by an over-abundance of zeros and small non-zero counts. Existing approaches for analyzing multivariate count data via Poisson or negative binomial log-linear hierarchical models with zero-inflation cannot flexibly adapt to the level and nature of sparsity in the data. We develop a new class of continuous local-global shrinkage priors tailored for sparse counts. Theoretical properties are assessed, including posterior concentration, stronger control on false discoveries in multiple testing, robustness in posterior mean and super-efficiency in estimating the sampling density. Simulation studies illustrate excellent small sample properties relative to competitors. We apply the method to detect rare mutational hotspots in exome sequencing data and to identify cities most impacted by terrorism.

Keywords

high-dimensional regression bayesian shrinkage sufficient dimension reduction

Cite

@article{arxiv.1510.04320,
  title  = {Inference on High-Dimensional Sparse Count Data},
  author = {Jyotishka Datta and David B. Dunson},
  journal= {arXiv preprint arXiv:1510.04320},
  year   = {2016}
}

Comments

20 pages, 7 figures, 2 tables. (This version has a new result regarding tighter control on false discoveries and another real data example. Additional proofs and examples are given in the supplementary file.)

Inference on High-Dimensional Sparse Count Data

Abstract

Keywords

Cite

Comments

Related papers