Related papers: Learning Numeral Embeddings

Few-Shot Representation Learning for Out-Of-Vocabulary Words

Existing approaches for learning word embeddings often assume there are sufficient occurrences for each word in the corpus, such that the representation of words can be accurately estimated from their contexts. However, in real-world…

Computation and Language · Computer Science 2019-07-02 Ziniu Hu , Ting Chen , Kai-Wei Chang , Yizhou Sun

Learning Semantic Representations for Novel Words: Leveraging Both Form and Context

Word embeddings are a key component of high-performing natural language processing (NLP) systems, but it remains a challenge to learn good representations for novel words on the fly, i.e., for words that did not occur in the training data.…

Computation and Language · Computer Science 2018-11-12 Timo Schick , Hinrich Schütze

Mimicking Word Embeddings using Subword RNNs

Word embeddings improve generalization over lexical features by placing each word in a lower-dimensional space, using distributional information obtained from unlabeled data. However, the effectiveness of word embeddings for downstream NLP…

Computation and Language · Computer Science 2017-07-24 Yuval Pinter , Robert Guthrie , Jacob Eisenstein

Deep learning models for representing out-of-vocabulary words

Communication has become increasingly dynamic with the popularization of social networks and applications that allow people to express themselves and communicate instantly. In this scenario, distributed representation models have their…

Computation and Language · Computer Science 2024-05-30 Johannes V. Lochter , Renato M. Silva , Tiago A. Almeida

Word and Document Embeddings based on Neural Network Approaches

Data representation is a fundamental task in machine learning. The representation of data affects the performance of the whole machine learning system. In a long history, the representation of data is done by feature engineering, and…

Computation and Language · Computer Science 2016-11-21 Siwei Lai

Predicting and interpreting embeddings for out of vocabulary words in downstream tasks

We propose a novel way to handle out of vocabulary (OOV) words in downstream natural language processing (NLP) tasks. We implement a network that predicts useful embeddings for OOV words based on their morphology and on the context in which…

Computation and Language · Computer Science 2019-03-05 Nicolas Garneau , Jean-Samuel Leboeuf , Luc Lamontagne

Trajectory-Based Meta-Learning for Out-Of-Vocabulary Word Embedding Learning

Word embedding learning methods require a large number of occurrences of a word to accurately learn its embedding. However, out-of-vocabulary (OOV) words which do not appear in the training corpus emerge frequently in the smaller downstream…

Computation and Language · Computer Science 2021-02-25 Gordon Buck , Andreas Vlachos

A Survey of Word Embeddings Evaluation Methods

Word embeddings are real-valued word representations able to capture lexical semantics and trained on natural language corpora. Models proposing these representations have gained popularity in the recent years, but the issue of the most…

Computation and Language · Computer Science 2018-01-30 Amir Bakarov

Estimator Vectors: OOV Word Embeddings based on Subword and Context Clue Estimates

Semantic representations of words have been successfully extracted from unlabeled corpuses using neural network models like word2vec. These representations are generally high quality and are computationally inexpensive to train, making them…

Computation and Language · Computer Science 2019-10-24 Raj Patel , Carlotta Domeniconi

Do NLP Models Know Numbers? Probing Numeracy in Embeddings

The ability to understand and work with numbers (numeracy) is critical for many complex reasoning tasks. Currently, most NLP models treat numbers in text in the same way as other tokens---they embed them as distributed vectors. Is this…

Computation and Language · Computer Science 2019-09-19 Eric Wallace , Yizhong Wang , Sujian Li , Sameer Singh , Matt Gardner

Learning Task-specific Representation for Novel Words in Sequence Labeling

Word representation is a key component in neural-network-based sequence labeling systems. However, representations of unseen or rare words trained on the end task are usually poor for appreciable performance. This is commonly referred to as…

Computation and Language · Computer Science 2019-05-30 Minlong Peng , Qi Zhang , Xiaoyu Xing , Tao Gui , Jinlan Fu , Xuanjing Huang

Learning Word Embeddings from Intrinsic and Extrinsic Views

While word embeddings are currently predominant for natural language processing, most of existing models learn them solely from their contexts. However, these context-based word embeddings are limited since not all words' meaning can be…

Computation and Language · Computer Science 2016-08-23 Jifan Chen , Kan Chen , Xipeng Qiu , Qi Zhang , Xuanjing Huang , Zheng Zhang

Learning Meta-Embeddings by Using Ensembles of Embedding Sets

Word embeddings -- distributed representations of words -- in deep learning are beneficial for many tasks in natural language processing (NLP). However, different embedding sets vary greatly in quality and characteristics of the captured…

Computation and Language · Computer Science 2015-12-31 Wenpeng Yin , Hinrich Schütze

On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms

Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper…

Computation and Language · Computer Science 2020-06-18 Adam Sutton , Nello Cristianini

Word Embeddings: Stability and Semantic Change

Word embeddings are computed by a class of techniques within natural language processing (NLP), that create continuous vector representations of words in a language from a large text corpus. The stochastic nature of the training process of…

Computation and Language · Computer Science 2020-08-03 Lucas Rettenmeier

Word Embedding based on Low-Rank Doubly Stochastic Matrix Decomposition

Word embedding, which encodes words into vectors, is an important starting point in natural language processing and commonly used in many text-based machine learning tasks. However, in most current word embedding approaches, the similarity…

Computation and Language · Computer Science 2018-12-27 Denis Sedov , Zhirong Yang

Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

State-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary (OOV) words. To address this issue, we follow the principle of mimick-like models to generate vectors for unseen…

Computation and Language · Computer Science 2022-03-22 Lihu Chen , Gaël Varoquaux , Fabian M. Suchanek

Exploration on Grounded Word Embedding: Matching Words and Images with Image-Enhanced Skip-Gram Model

Word embedding is designed to represent the semantic meaning of a word with low dimensional vectors. The state-of-the-art methods of learning word embeddings (word2vec and GloVe) only use the word co-occurrence information. The learned…

Computation and Language · Computer Science 2018-09-11 Ruixuan Luo

Out-of-Vocabulary Embedding Imputation with Grounded Language Information by Graph Convolutional Networks

Due to the ubiquitous use of embeddings as input representations for a wide range of natural language tasks, imputation of embeddings for rare and unseen words is a critical problem in language processing. Embedding imputation involves…

Computation and Language · Computer Science 2020-06-09 Ziyi Yang , Chenguang Zhu , Vin Sachidananda , Eric Darve

Revealing the Numeracy Gap: An Empirical Investigation of Text Embedding Models

Text embedding models are widely used in natural language processing applications. However, their capability is often benchmarked on tasks that do not require understanding nuanced numerical information in text. As a result, it remains…

Computation and Language · Computer Science 2025-09-09 Ningyuan Deng , Hanyu Duan , Yixuan Tang , Yi Yang