Related papers: WordNet2Vec: Corpora Agnostic Word Vectorization M…

Measuring Word Significance using Distributed Representations of Words

Distributed representations of words as real-valued vectors in a relatively low-dimensional space aim at extracting syntactic and semantic features from large text corpora. A recently introduced neural network, named word2vec (Mikolov et…

Computation and Language · Computer Science 2015-08-11 Adriaan M. J. Schakel , Benjamin J. Wilson

Machine Learning Sentiment Prediction based on Hybrid Document Representation

Automated sentiment analysis and opinion mining is a complex process concerning the extraction of useful subjective information from text. The explosion of user generated content on the Web, especially the fact that millions of users, on a…

Computation and Language · Computer Science 2015-12-01 Panagiotis Stalidis , Maria Giatsoglou , Konstantinos Diamantaras , George Sarigiannidis , Konstantinos Ch. Chatzisavvas

Retrofitting Word Vectors to Semantic Lexicons

Vector space word representations are learned from distributional information of words in large corpora. Although such statistics are semantically informative, they disregard the valuable information that is contained in semantic lexicons…

Computation and Language · Computer Science 2015-03-24 Manaal Faruqui , Jesse Dodge , Sujay K. Jauhar , Chris Dyer , Eduard Hovy , Noah A. Smith

Context-aware Sentiment Word Identification: sentiword2vec

Traditional sentiment analysis often uses sentiment dictionary to extract sentiment information in text and classify documents. However, emerging informal words and phrases in user generated content call for analysis aware to the context.…

Computation and Language · Computer Science 2016-12-14 Yushi Yao , Guangjian Li

AWE-CM Vectors: Augmenting Word Embeddings with a Clinical Metathesaurus

In recent years, word embeddings have been surprisingly effective at capturing intuitive characteristics of the words they represent. These vectors achieve the best results when training corpora are extremely large, sometimes billions of…

Computation and Language · Computer Science 2017-12-06 Willie Boag , Hassan Kané

A Context-theoretic Framework for Compositionality in Distributional Semantics

Techniques in which words are represented as vectors have proved useful in many applications in computational linguistics, however there is currently no general semantic formalism for representing meaning in terms of vectors. We present a…

Computation and Language · Computer Science 2015-03-17 Daoud Clarke

Semantic Holism and Word Representations in Artificial Neural Networks

Artificial neural networks are a state-of-the-art solution for many problems in natural language processing. What can we learn about language and meaning from the way artificial neural networks represent it? Word representations obtained…

Computation and Language · Computer Science 2020-03-13 Tomáš Musil

Network-Efficient Distributed Word2vec Training System for Large Vocabularies

Word2vec is a popular family of algorithms for unsupervised training of dense vector representations of words on large text corpuses. The resulting vectors have been shown to capture semantic relationships among their corresponding words,…

Computation and Language · Computer Science 2016-06-29 Erik Ordentlich , Lee Yang , Andy Feng , Peter Cnudde , Mihajlo Grbovic , Nemanja Djuric , Vladan Radosavljevic , Gavin Owens

vec2text with Round-Trip Translations

We investigate models that can generate arbitrary natural language text (e.g. all English sentences) from a bounded, convex and well-behaved control space. We call them universal vec2text models. Such models would allow making semantic…

Computation and Language · Computer Science 2022-09-15 Geoffrey Cideron , Sertan Girgin , Anton Raichuk , Olivier Pietquin , Olivier Bachem , Léonard Hussenot

Embedding Words and Senses Together via Joint Knowledge-Enhanced Training

Word embeddings are widely used in Natural Language Processing, mainly due to their success in capturing semantic information from massive corpora. However, their creation process does not allow the different meanings of a word to be…

Computation and Language · Computer Science 2017-06-22 Massimiliano Mancini , Jose Camacho-Collados , Ignacio Iacobacci , Roberto Navigli

Domain-Specific Word Embeddings with Structure Prediction

Complementary to finding good general word embeddings, an important question for representation learning is to find dynamic word embeddings, e.g., across time or domain. Current methods do not offer a way to use or predict information on…

Computation and Language · Computer Science 2022-10-12 Stephanie Brandl , David Lassner , Anne Baillot , Shinichi Nakajima

Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research

Conventional text classification models make a bag-of-words assumption reducing text into word occurrence counts per document. Recent algorithms such as word2vec are capable of learning semantic meaning and similarity between words in an…

Computation and Language · Computer Science 2018-07-11 Vincent Major , Alisa Surkis , Yindalon Aphinyanaphongs

Dis-S2V: Discourse Informed Sen2Vec

Vector representation of sentences is important for many text processing tasks that involve clustering, classifying, or ranking sentences. Recently, distributed representation of sentences learned by neural models from unlabeled data has…

Computation and Language · Computer Science 2016-10-27 Tanay Kumar Saha , Shafiq Joty , Naeemul Hassan , Mohammad Al Hasan

Polylingual Wordnet

Princeton WordNet is one of the most important resources for natural language processing, but is only available for English. While it has been translated using the expand approach to many other languages, this is an expensive manual…

Computation and Language · Computer Science 2019-03-05 Mihael Arcan , John McCrae , Paul Buitelaar

ConceptNet 5.5: An Open Multilingual Graph of General Knowledge

Machine learning about language can be improved by supplying it with specific knowledge and sources of external information. We present here a new version of the linked open data resource ConceptNet that is particularly well suited to be…

Computation and Language · Computer Science 2018-12-12 Robyn Speer , Joshua Chin , Catherine Havasi

Context-theoretic Semantics for Natural Language: an Algebraic Framework

Techniques in which words are represented as vectors have proved useful in many applications in computational linguistics, however there is currently no general semantic formalism for representing meaning in terms of vectors. We present a…

Computation and Language · Computer Science 2020-09-23 Daoud Clarke

Joint Word Representation Learning using a Corpus and a Semantic Lexicon

Methods for learning word representations using large text corpora have received much attention lately due to their impressive performance in numerous natural language processing (NLP) tasks such as, semantic similarity measurement, and…

Computation and Language · Computer Science 2015-11-23 Danushka Bollegala , Alsuhaibani Mohammed , Takanori Maehara , Ken-ichi Kawarabayashi

An Intelligent CNN-VAE Text Representation Technology Based on Text Semantics for Comprehensive Big Data

In the era of big data, a large number of text data generated by the Internet has given birth to a variety of text representation methods. In natural language processing (NLP), text representation transforms text into vectors that can be…

Machine Learning · Computer Science 2020-08-31 Genggeng Liu , Canyang Guo , Lin Xie , Wenxi Liu , Naixue Xiong , Guolong Chen

A Comprehensive Empirical Evaluation of Existing Word Embedding Approaches

Vector-based word representations help countless Natural Language Processing (NLP) tasks capture the language's semantic and syntactic regularities. In this paper, we present the characteristics of existing word embedding approaches and…

Computation and Language · Computer Science 2024-03-05 Obaidullah Zaland , Muhammad Abulaish , Mohd. Fazil

word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement

Deep learning natural language processing models often use vector word embeddings, such as word2vec or GloVe, to represent words. A discrete sequence of words can be much more easily integrated with downstream neural layers if it is…

Machine Learning · Computer Science 2020-03-04 Aliakbar Panahi , Seyran Saeedi , Tom Arodz