Sparse Classification: a scalable discrete optimization perspective

Dimitris Bertsimas; Jean Pauphilet; Bart Van Parys

doi:10.1007/s10994-021-06085-5

Sparse Classification: a scalable discrete optimization perspective

Optimization and Control 2025-01-08 v4

Authors: Dimitris Bertsimas , Jean Pauphilet , Bart Van Parys

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

We formulate the sparse classification problem of $n$ samples with $p$ features as a binary convex optimization problem and propose a cutting-plane algorithm to solve it exactly. For sparse logistic regression and sparse SVM, our algorithm finds optimal solutions for $n$ and $p$ in the $10,000$ s within minutes. On synthetic data our algorithm achieves perfect support recovery in the large sample regime. Namely, there exists a $n_0$ such that the algorithm takes a long time to find the optimal solution and does not recover the correct support for $n<n_0$ , while for $n\geqslant n_0$ , the algorithm quickly detects all the true features, and does not return any false features. In contrast, while Lasso accurately detects all the true features, it persistently returns incorrect features, even as the number of observations increases. Consequently, on numerous real-world experiments, our outer-approximation algorithms returns sparser classifiers while achieving similar predictive accuracy as Lasso. To support our observations, we analyze conditions on the sample size needed to ensure full support recovery in classification. Under some assumptions on the data generating process, we prove that information-theoretic limitations impose $n_0 < C \left(2 + \sigma^2\right) k \log(p-k)$ , for some constant $C>0$ .

Keywords

compressed sensing statistical estimation stochastic optimization

Cite

@article{arxiv.1710.01352,
  title  = {Sparse Classification: a scalable discrete optimization perspective},
  author = {Dimitris Bertsimas and Jean Pauphilet and Bart Van Parys},
  journal= {arXiv preprint arXiv:1710.01352},
  year   = {2025}
}

Sparse Classification: a scalable discrete optimization perspective

Abstract

Keywords

Cite

Related papers