Related papers: Bayesian Paragraph Vectors

Document Embedding with Paragraph Vectors

Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts. In their work, the authors showed that the method can learn an embedding of movie review texts which can be…

Computation and Language · Computer Science 2015-07-30 Andrew M. Dai , Christopher Olah , Quoc V. Le

Distributed Representations of Sentences and Documents

Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features…

Computation and Language · Computer Science 2014-05-26 Quoc V. Le , Tomas Mikolov

Binary Paragraph Vectors

Recently Le & Mikolov described two log-linear models, called Paragraph Vector, that can be used to learn state-of-the-art distributed representations of documents. Inspired by this work, we present Binary Paragraph Vector models: simple…

Computation and Language · Computer Science 2017-06-12 Karol Grzegorczyk , Marcin Kurdziel

word2vec Parameter Learning Explained

The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector representations of words learned by word2vec models have been shown to carry semantic meanings and are useful in…

Computation and Language · Computer Science 2016-06-07 Xin Rong

Word2Vec: Optimal Hyper-Parameters and Their Impact on NLP Downstream Tasks

Word2Vec is a prominent model for natural language processing (NLP) tasks. Similar inspiration is found in distributed embeddings for new state-of-the-art (SotA) deep neural networks. However, wrong combination of hyper-parameters can…

Computation and Language · Computer Science 2021-04-20 Tosin P. Adewumi , Foteini Liwicki , Marcus Liwicki

Towards a Theoretical Understanding of Word and Relation Representation

Representing words by vectors, or embeddings, enables computational reasoning and is foundational to automating natural language tasks. For example, if word embeddings of similar words contain similar values, word similarity can be readily…

Computation and Language · Computer Science 2022-02-02 Carl Allen

Domain-Specific Word Embeddings with Structure Prediction

Complementary to finding good general word embeddings, an important question for representation learning is to find dynamic word embeddings, e.g., across time or domain. Current methods do not offer a way to use or predict information on…

Computation and Language · Computer Science 2022-10-12 Stephanie Brandl , David Lassner , Anne Baillot , Shinichi Nakajima

Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification

While paragraph embedding models are remarkably effective for downstream classification tasks, what they learn and encode into a single vector remains opaque. In this paper, we investigate a state-of-the-art paragraph embedding method…

Computation and Language · Computer Science 2019-06-11 Tu Vu , Mohit Iyyer

Can Network Embedding of Distributional Thesaurus be Combined with Word Vectors for Better Representation?

Distributed representations of words learned from text have proved to be successful in various natural language processing tasks in recent times. While some methods represent words as vectors computed from text using predictive model…

Computation and Language · Computer Science 2018-02-20 Abhik Jana , Pawan Goyal

Word-Graph2vec: An efficient word embedding approach on word co-occurrence graph using random walk technique

Word embedding has become ubiquitous and is widely used in various natural language processing (NLP) tasks, such as web retrieval, web semantic analysis, and machine translation, and so on. Unfortunately, training the word embedding in a…

Computation and Language · Computer Science 2023-12-29 Wenting Li , Jiahong Xue , Xi Zhang , Huacan Chen , Zeyu Chen , Feijuan Huang , Yuanzhe Cai

word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data

Vector representations of graphs and relational structures, whether hand-crafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to the structures. A wide range of…

Machine Learning · Computer Science 2020-03-31 Martin Grohe

Topic2Vec: Learning Distributed Representations of Topics

Latent Dirichlet Allocation (LDA) mining thematic structure of documents plays an important role in nature language processing and machine learning areas. However, the probability distribution from LDA only describes the statistical…

Computation and Language · Computer Science 2015-06-30 Li-Qiang Niu , Xin-Yu Dai

From Word Vectors to Multimodal Embeddings: Techniques, Applications, and Future Directions For Large Language Models

Word embeddings and language models have transformed natural language processing (NLP) by facilitating the representation of linguistic elements in continuous vector spaces. This review visits foundational concepts such as the…

Computation and Language · Computer Science 2025-12-03 Charles Zhang , Benji Peng , Xintian Sun , Qian Niu , Junyu Liu , Keyu Chen , Ming Li , Pohsun Feng , Ziqian Bi , Ming Liu , Yichao Zhang , Xinyuan Song , Cheng Fei , Caitlyn Heqi Yin , Lawrence KQ Yan , Hongyang He , Tianyang Wang

Dynamic Word Embeddings

We present a probabilistic language model for time-stamped text data which tracks the semantic evolution of individual words over time. The model represents words and contexts by latent trajectories in an embedding space. At each moment in…

Machine Learning · Statistics 2017-07-19 Robert Bamler , Stephan Mandt

Class Vectors: Embedding representation of Document Classes

Distributed representations of words and paragraphs as semantic embeddings in high dimensional data are used across a number of Natural Language Understanding tasks such as retrieval, translation, and classification. In this work, we…

Computation and Language · Computer Science 2015-08-04 Devendra Singh Sachan , Shailesh Kumar

The Corpus Replication Task

In the field of Natural Language Processing (NLP), we revisit the well-known word embedding algorithm word2vec. Word embeddings identify words by vectors such that the words' distributional similarity is captured. Unexpectedly, besides…

Machine Learning · Computer Science 2018-06-22 Tobias Eichinger

Dis-S2V: Discourse Informed Sen2Vec

Vector representation of sentences is important for many text processing tasks that involve clustering, classifying, or ranking sentences. Recently, distributed representation of sentences learned by neural models from unlabeled data has…

Computation and Language · Computer Science 2016-10-27 Tanay Kumar Saha , Shafiq Joty , Naeemul Hassan , Mohammad Al Hasan

Searching for Discriminative Words in Multidimensional Continuous Feature Space

Word feature vectors have been proven to improve many NLP tasks. With recent advances in unsupervised learning of these feature vectors, it became possible to train it with much more data, which also resulted in better quality of learned…

Computation and Language · Computer Science 2022-11-29 Marius Sajgalik , Michal Barla , Maria Bielikova

Evaluating vector-space models of analogy

Vector-space representations provide geometric tools for reasoning about the similarity of a set of objects and their relationships. Recent machine learning methods for deriving vector-space embeddings of words (e.g., word2vec) have…

Computation and Language · Computer Science 2017-06-12 Dawn Chen , Joshua C. Peterson , Thomas L. Griffiths

Measuring Word Significance using Distributed Representations of Words

Distributed representations of words as real-valued vectors in a relatively low-dimensional space aim at extracting syntactic and semantic features from large text corpora. A recently introduced neural network, named word2vec (Mikolov et…

Computation and Language · Computer Science 2015-08-11 Adriaan M. J. Schakel , Benjamin J. Wilson