Adaptive Regularization for Weight Matrices

Koby Crammer; Gal Chechik

Adaptive Regularization for Weight Matrices

Machine Learning 2012-06-22 v1 Artificial Intelligence

Authors: Koby Crammer , Gal Chechik

Abstract

Algorithms for learning distributions over weight-vectors, such as AROW were recently shown empirically to achieve state-of-the-art performance at various problems, with strong theoretical guaranties. Extending these algorithms to matrix models pose challenges since the number of free parameters in the covariance of the distribution scales as $n^4$ with the dimension $n$ of the matrix, and $n$ tends to be large in real applications. We describe, analyze and experiment with two new algorithms for learning distribution of matrix models. Our first algorithm maintains a diagonal covariance over the parameters and can handle large covariance matrices. The second algorithm factors the covariance to capture inter-features correlation while keeping the number of parameters linear in the size of the original matrix. We analyze both algorithms in the mistake bound model and show a superior precision performance of our approach over other algorithms in two tasks: retrieving similar images, and ranking similar documents. The factored algorithm is shown to attain faster convergence rate.

Keywords

distributionally robust optimization regularization nonnegative matrix factorization

Cite

@article{arxiv.1206.4639,
  title  = {Adaptive Regularization for Weight Matrices},
  author = {Koby Crammer and Gal Chechik},
  journal= {arXiv preprint arXiv:1206.4639},
  year   = {2012}
}

Comments

ICML2012

Adaptive Regularization for Weight Matrices

Abstract

Keywords

Cite

Comments

Related papers