WordNet2Vec: Corpora Agnostic Word Vectorization Method

Roman Bartusiak; Łukasz Augustyniak; Tomasz Kajdanowicz; Przemysław Kazienko; Maciej Piasecki

WordNet2Vec: Corpora Agnostic Word Vectorization Method

Computation and Language 2016-06-13 v1 Artificial Intelligence Distributed, Parallel, and Cluster Computing

Authors: Roman Bartusiak , Łukasz Augustyniak , Tomasz Kajdanowicz , Przemysław Kazienko , Maciej Piasecki

Abstract

A complex nature of big data resources demands new methods for structuring especially for textual content. WordNet is a good knowledge source for comprehensive abstraction of natural language as its good implementations exist for many languages. Since WordNet embeds natural language in the form of a complex network, a transformation mechanism WordNet2Vec is proposed in the paper. It creates vectors for each word from WordNet. These vectors encapsulate general position - role of a given word towards all other words in the natural language. Any list or set of such vectors contains knowledge about the context of its component within the whole language. Such word representation can be easily applied to many analytic tasks like classification or clustering. The usefulness of the WordNet2Vec method was demonstrated in sentiment analysis, i.e. classification with transfer learning for the real Amazon opinion textual dataset.

Keywords

word embeddings word sense disambiguation text classification

Cite

@article{arxiv.1606.03335,
  title  = {WordNet2Vec: Corpora Agnostic Word Vectorization Method},
  author = {Roman Bartusiak and Łukasz Augustyniak and Tomasz Kajdanowicz and Przemysław Kazienko and Maciej Piasecki},
  journal= {arXiv preprint arXiv:1606.03335},
  year   = {2016}
}

Comments

29 pages, 16 figures, submitted to journal

WordNet2Vec: Corpora Agnostic Word Vectorization Method

Abstract

Keywords

Cite

Comments

Related papers