English

A Communication-Efficient Parallel Method for Group-Lasso

Machine Learning 2016-12-08 v1 Machine Learning

Abstract

Group-Lasso (gLasso) identifies important explanatory factors in predicting the response variable by considering the grouping structure over input variables. However, most existing algorithms for gLasso are not scalable to deal with large-scale datasets, which are becoming a norm in many applications. In this paper, we present a divide-and-conquer based parallel algorithm (DC-gLasso) to scale up gLasso in the tasks of regression with grouping structures. DC-gLasso only needs two iterations to collect and aggregate the local estimates on subsets of the data, and is provably correct to recover the true model under certain conditions. We further extend it to deal with overlappings between groups. Empirical results on a wide range of synthetic and real-world datasets show that DC-gLasso can significantly improve the time efficiency without sacrificing regression accuracy.

Keywords

Cite

@article{arxiv.1612.02222,
  title  = {A Communication-Efficient Parallel Method for Group-Lasso},
  author = {Binghong Chen and Jun Zhu},
  journal= {arXiv preprint arXiv:1612.02222},
  year   = {2016}
}

Comments

7 pages

R2 v1 2026-06-22T17:16:07.917Z