Sparse Optimization on Measures with Over-parameterized Gradient Descent

Lenaic Chizat

Sparse Optimization on Measures with Over-parameterized Gradient Descent

Optimization and Control 2020-11-04 v2 Machine Learning

Authors: Lenaic Chizat

Abstract

Minimizing a convex function of a measure with a sparsity-inducing penalty is a typical problem arising, e.g., in sparse spikes deconvolution or two-layer neural networks training. We show that this problem can be solved by discretizing the measure and running non-convex gradient descent on the positions and weights of the particles. For measures on a $d$ -dimensional manifold and under some non-degeneracy assumptions, this leads to a global optimization algorithm with a complexity scaling as $\log(1/\epsilon)$ in the desired accuracy $\epsilon$ , instead of $\epsilon^{-d}$ for convex methods. The key theoretical tools are a local convergence analysis in Wasserstein space and an analysis of a perturbed mirror descent in the space of measures. Our bounds involve quantities that are exponential in $d$ which is unavoidable under our assumptions.

Keywords

compressed sensing convex optimization stochastic optimization

Cite

@article{arxiv.1907.10300,
  title  = {Sparse Optimization on Measures with Over-parameterized Gradient Descent},
  author = {Lenaic Chizat},
  journal= {arXiv preprint arXiv:1907.10300},
  year   = {2020}
}

Sparse Optimization on Measures with Over-parameterized Gradient Descent

Abstract

Keywords

Cite

Related papers