Related papers: Learning Numeral Embeddings
Existing approaches for learning word embeddings often assume there are sufficient occurrences for each word in the corpus, such that the representation of words can be accurately estimated from their contexts. However, in real-world…
Word embeddings are a key component of high-performing natural language processing (NLP) systems, but it remains a challenge to learn good representations for novel words on the fly, i.e., for words that did not occur in the training data.…
Word embeddings improve generalization over lexical features by placing each word in a lower-dimensional space, using distributional information obtained from unlabeled data. However, the effectiveness of word embeddings for downstream NLP…
Communication has become increasingly dynamic with the popularization of social networks and applications that allow people to express themselves and communicate instantly. In this scenario, distributed representation models have their…
Data representation is a fundamental task in machine learning. The representation of data affects the performance of the whole machine learning system. In a long history, the representation of data is done by feature engineering, and…
We propose a novel way to handle out of vocabulary (OOV) words in downstream natural language processing (NLP) tasks. We implement a network that predicts useful embeddings for OOV words based on their morphology and on the context in which…
Word embedding learning methods require a large number of occurrences of a word to accurately learn its embedding. However, out-of-vocabulary (OOV) words which do not appear in the training corpus emerge frequently in the smaller downstream…
Word embeddings are real-valued word representations able to capture lexical semantics and trained on natural language corpora. Models proposing these representations have gained popularity in the recent years, but the issue of the most…
Semantic representations of words have been successfully extracted from unlabeled corpuses using neural network models like word2vec. These representations are generally high quality and are computationally inexpensive to train, making them…
The ability to understand and work with numbers (numeracy) is critical for many complex reasoning tasks. Currently, most NLP models treat numbers in text in the same way as other tokens---they embed them as distributed vectors. Is this…
Word representation is a key component in neural-network-based sequence labeling systems. However, representations of unseen or rare words trained on the end task are usually poor for appreciable performance. This is commonly referred to as…
While word embeddings are currently predominant for natural language processing, most of existing models learn them solely from their contexts. However, these context-based word embeddings are limited since not all words' meaning can be…
Word embeddings -- distributed representations of words -- in deep learning are beneficial for many tasks in natural language processing (NLP). However, different embedding sets vary greatly in quality and characteristics of the captured…
Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper…
Word embeddings are computed by a class of techniques within natural language processing (NLP), that create continuous vector representations of words in a language from a large text corpus. The stochastic nature of the training process of…
Word embedding, which encodes words into vectors, is an important starting point in natural language processing and commonly used in many text-based machine learning tasks. However, in most current word embedding approaches, the similarity…
State-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary (OOV) words. To address this issue, we follow the principle of mimick-like models to generate vectors for unseen…
Word embedding is designed to represent the semantic meaning of a word with low dimensional vectors. The state-of-the-art methods of learning word embeddings (word2vec and GloVe) only use the word co-occurrence information. The learned…
Due to the ubiquitous use of embeddings as input representations for a wide range of natural language tasks, imputation of embeddings for rare and unseen words is a critical problem in language processing. Embedding imputation involves…
Text embedding models are widely used in natural language processing applications. However, their capability is often benchmarked on tasks that do not require understanding nuanced numerical information in text. As a result, it remains…