English

Algorithms for Sparse Support Vector Machines

Methodology 2021-10-18 v1

Abstract

Many problems in classification involve huge numbers of irrelevant features. Model selection reveals the crucial features, reduces the dimensionality of feature space, and improves model interpretation. In the support vector machine literature, model selection is achieved by 1\ell_1 penalties. These convex relaxations seriously bias parameter estimates toward 0 and tend to admit too many irrelevant features. The current paper presents an alternative that replaces penalties by sparse-set constraints. Penalties still appear, but serve a different purpose. The proximal distance principle takes a loss function L(β)L(\boldsymbol{\beta}) and adds the penalty ρ2dist(β,Sk)2\frac{\rho}{2}\mathrm{dist}(\boldsymbol{\beta}, S_k)^2 capturing the squared Euclidean distance of the parameter vector β\boldsymbol{\beta} to the sparsity set SkS_k where at most kk components of β\boldsymbol{\beta} are nonzero. If βρ\boldsymbol{\beta}_\rho represents the minimum of the objective fρ(β)=L(β)+ρ2dist(β,Sk)2f_\rho(\boldsymbol{\beta})=L(\boldsymbol{\beta})+\frac{\rho}{2}\mathrm{dist}(\boldsymbol{\beta}, S_k)^2, then βρ\boldsymbol{\beta}_\rho tends to the constrained minimum of L(β)L(\boldsymbol{\beta}) over SkS_k as ρ\rho tends to \infty. We derive two closely related algorithms to carry out this strategy. Our simulated and real examples vividly demonstrate how the algorithms achieve much better sparsity without loss of classification power.

Keywords

Cite

@article{arxiv.2110.07691,
  title  = {Algorithms for Sparse Support Vector Machines},
  author = {Alfonso Landeros and Kenneth Lange},
  journal= {arXiv preprint arXiv:2110.07691},
  year   = {2021}
}

Comments

Main text: 21 pages, 3 figures, 4 tables; Appendix: 6 pages, 2 figures