English

c-lasso -- a Python package for constrained sparse and robust regression and classification

Computation 2020-11-03 v1 Mathematical Software Optimization and Control Machine Learning

Abstract

We introduce c-lasso, a Python package that enables sparse and robust linear regression and classification with linear equality constraints. The underlying statistical forward model is assumed to be of the following form: y=Xβ+σϵsubject toCβ=0 y = X \beta + \sigma \epsilon \qquad \textrm{subject to} \qquad C\beta=0 Here, XRn×dX \in \mathbb{R}^{n\times d}is a given design matrix and the vector yRny \in \mathbb{R}^{n} is a continuous or binary response vector. The matrix CC is a general constraint matrix. The vector βRd\beta \in \mathbb{R}^{d} contains the unknown coefficients and σ\sigma an unknown scale. Prominent use cases are (sparse) log-contrast regression with compositional data XX, requiring the constraint 1dTβ=01_d^T \beta = 0 (Aitchion and Bacon-Shone 1984) and the Generalized Lasso which is a special case of the described problem (see, e.g, (James, Paulson, and Rusmevichientong 2020), Example 3). The c-lasso package provides estimators for inferring unknown coefficients and scale (i.e., perspective M-estimators (Combettes and M\"uller 2020a)) of the form minβRd,σR0f(Xβy,σ)+λβ1subject toCβ=0 \min_{\beta \in \mathbb{R}^d, \sigma \in \mathbb{R}_{0}} f\left(X\beta - y,{\sigma} \right) + \lambda \left\lVert \beta\right\rVert_1 \qquad \textrm{subject to} \qquad C\beta = 0 for several convex loss functions f(,)f(\cdot,\cdot). This includes the constrained Lasso, the constrained scaled Lasso, and sparse Huber M-estimators with linear equality constraints.

Keywords

Cite

@article{arxiv.2011.00898,
  title  = {c-lasso -- a Python package for constrained sparse and robust regression and classification},
  author = {Léo Simpson and Patrick L. Combettes and Christian L. Müller},
  journal= {arXiv preprint arXiv:2011.00898},
  year   = {2020}
}