English

Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations

Computation and Language 2016-06-08 v2

Abstract

In this paper, we propose LexVec, a new method for generating distributed word representations that uses low-rank, weighted factorization of the Positive Point-wise Mutual Information matrix via stochastic gradient descent, employing a weighting scheme that assigns heavier penalties for errors on frequent co-occurrences while still accounting for negative co-occurrence. Evaluation on word similarity and analogy tasks shows that LexVec matches and often outperforms state-of-the-art methods on many of these tasks.

Keywords

Cite

@article{arxiv.1606.00819,
  title  = {Matrix Factorization using Window Sampling and Negative Sampling for Improved Word Representations},
  author = {Alexandre Salle and Marco Idiart and Aline Villavicencio},
  journal= {arXiv preprint arXiv:1606.00819},
  year   = {2016}
}

Comments

Converted paper size from A4 to US Letter to avoid margin issues on arXiv

R2 v1 2026-06-22T14:16:12.726Z