A Communication-Efficient Parallel Method for Group-Lasso

Binghong Chen; Jun Zhu

A Communication-Efficient Parallel Method for Group-Lasso

Machine Learning 2016-12-08 v1 Machine Learning

Authors: Binghong Chen , Jun Zhu

Abstract

Group-Lasso (gLasso) identifies important explanatory factors in predicting the response variable by considering the grouping structure over input variables. However, most existing algorithms for gLasso are not scalable to deal with large-scale datasets, which are becoming a norm in many applications. In this paper, we present a divide-and-conquer based parallel algorithm (DC-gLasso) to scale up gLasso in the tasks of regression with grouping structures. DC-gLasso only needs two iterations to collect and aggregate the local estimates on subsets of the data, and is provably correct to recover the true model under certain conditions. We further extend it to deal with overlappings between groups. Empirical results on a wide range of synthetic and real-world datasets show that DC-gLasso can significantly improve the time efficiency without sacrificing regression accuracy.

Keywords

parallel algorithm randomized algorithm gaussian process

Cite

@article{arxiv.1612.02222,
  title  = {A Communication-Efficient Parallel Method for Group-Lasso},
  author = {Binghong Chen and Jun Zhu},
  journal= {arXiv preprint arXiv:1612.02222},
  year   = {2016}
}

Comments

7 pages

A Communication-Efficient Parallel Method for Group-Lasso

Abstract

Keywords

Cite

Comments

Related papers