Compressing Word Embeddings

Martin Andrews

Compressing Word Embeddings

Computation and Language 2016-05-17 v2 Machine Learning

Authors: Martin Andrews

Abstract

Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic. However, these vector space representations (created through large-scale text analysis) are typically stored verbatim, since their internal structure is opaque. Using word-analogy tests to monitor the level of detail stored in compressed re-representations of the same vector space, the trade-offs between the reduction in memory usage and expressiveness are investigated. A simple scheme is outlined that can reduce the memory footprint of a state-of-the-art embedding by a factor of 10, with only minimal impact on performance. Then, using the same `bit budget', a binary (approximate) factorisation of the same space is also explored, with the aim of creating an equivalent representation with better interpretability.

Keywords

word embeddings source coding representation learning

Cite

@article{arxiv.1511.06397,
  title  = {Compressing Word Embeddings},
  author = {Martin Andrews},
  journal= {arXiv preprint arXiv:1511.06397},
  year   = {2016}
}

Comments

10 pages, 0 figures, submitted to ICONIP-2016. Previous experimental results were submitted to ICLR-2016, but the paper has been significantly updated, since a new experimental set-up worked much better

Compressing Word Embeddings

Abstract

Keywords

Cite

Comments

Related papers