Sparse Overcomplete Word Vector Representations

Manaal Faruqui; Yulia Tsvetkov; Dani Yogatama; Chris Dyer; Noah Smith

Sparse Overcomplete Word Vector Representations

Computation and Language 2015-06-08 v1

Authors: Manaal Faruqui , Yulia Tsvetkov , Dani Yogatama , Chris Dyer , Noah Smith

Abstract

Current distributed representations of words show little resemblance to theories of lexical semantics. The former are dense and uninterpretable, the latter largely based on familiar, discrete classes (e.g., supersenses) and relations (e.g., synonymy and hypernymy). We propose methods that transform word vectors into sparse (and optionally binary) vectors. The resulting representations are more similar to the interpretable features typically used in NLP, though they are discovered automatically from raw corpora. Because the vectors are highly sparse, they are computationally easy to work with. Most importantly, we find that they outperform the original vectors on benchmark tasks.

Keywords

word embeddings natural language parsing sparse learning

Cite

@article{arxiv.1506.02004,
  title  = {Sparse Overcomplete Word Vector Representations},
  author = {Manaal Faruqui and Yulia Tsvetkov and Dani Yogatama and Chris Dyer and Noah Smith},
  journal= {arXiv preprint arXiv:1506.02004},
  year   = {2015}
}

Comments

Proceedings of ACL 2015

Sparse Overcomplete Word Vector Representations

Abstract

Keywords

Cite

Comments

Related papers