Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations
Computation and Language
2016-06-08 v2
Abstract
In this paper, we propose LexVec, a new method for generating distributed word representations that uses low-rank, weighted factorization of the Positive Point-wise Mutual Information matrix via stochastic gradient descent, employing a weighting scheme that assigns heavier penalties for errors on frequent co-occurrences while still accounting for negative co-occurrence. Evaluation on word similarity and analogy tasks shows that LexVec matches and often outperforms state-of-the-art methods on many of these tasks.
Cite
@article{arxiv.1606.00819,
title = {Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations},
author = {Alexandre Salle and Marco Idiart and Aline Villavicencio},
journal= {arXiv preprint arXiv:1606.00819},
year = {2016}
}
Comments
Converted paper size from A4 to US Letter to avoid margin issues on arXiv