Related papers: Document Embedding with Paragraph Vectors

Class Vectors: Embedding representation of Document Classes

Distributed representations of words and paragraphs as semantic embeddings in high dimensional data are used across a number of Natural Language Understanding tasks such as retrieval, translation, and classification. In this work, we…

Computation and Language · Computer Science 2015-08-04 Devendra Singh Sachan , Shailesh Kumar

Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews

Despite the loss of semantic information, bag-of-ngram based methods still achieve state-of-the-art results for tasks such as sentiment classification of long movie reviews. Many document embeddings methods have been proposed to capture…

Computation and Language · Computer Science 2016-04-26 Bofang Li , Tao Liu , Xiaoyong Du , Deyuan Zhang , Zhe Zhao

Bayesian Paragraph Vectors

Word2vec (Mikolov et al., 2013) has proven to be successful in natural language processing by capturing the semantic relationships between different words. Built on top of single-word embeddings, paragraph vectors (Le and Mikolov, 2014)…

Computation and Language · Computer Science 2017-12-11 Geng Ji , Robert Bamler , Erik B. Sudderth , Stephan Mandt

Representing Documents and Queries as Sets of Word Embedded Vectors for Information Retrieval

A major difficulty in applying word vector embeddings in IR is in devising an effective and efficient strategy for obtaining representations of compound units of text, such as whole documents, (in comparison to the atomic words), for the…

Information Retrieval · Computer Science 2016-06-28 Dwaipayan Roy , Debasis Ganguly , Mandar Mitra , Gareth J. F. Jones

Evaluating the Utility of Document Embedding Vector Difference for Relation Learning

Recent work has demonstrated that vector offsets obtained by subtracting pretrained word embedding vectors can be used to predict lexical relations with surprising accuracy. Inspired by this finding, in this paper, we extend the idea to the…

Computation and Language · Computer Science 2019-07-19 Jingyuan Zhang , Timothy Baldwin

word2vec, node2vec, graph2vec, X2vec: Towards a Theory of Vector Embeddings of Structured Data

Vector representations of graphs and relational structures, whether hand-crafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to the structures. A wide range of…

Machine Learning · Computer Science 2020-03-31 Martin Grohe

Binary Paragraph Vectors

Recently Le & Mikolov described two log-linear models, called Paragraph Vector, that can be used to learn state-of-the-art distributed representations of documents. Inspired by this work, we present Binary Paragraph Vector models: simple…

Computation and Language · Computer Science 2017-06-12 Karol Grzegorczyk , Marcin Kurdziel

Distributed Representations of Sentences and Documents

Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features…

Computation and Language · Computer Science 2014-05-26 Quoc V. Le , Tomas Mikolov

Vector representations of text data in deep learning

In this dissertation we report results of our research on dense distributed representations of text data. We propose two novel neural models for learning such representations. The first model learns representations at the document level,…

Computation and Language · Computer Science 2019-01-08 Karol Grzegorczyk

Encouraging Paragraph Embeddings to Remember Sentence Identity Improves Classification

While paragraph embedding models are remarkably effective for downstream classification tasks, what they learn and encode into a single vector remains opaque. In this paper, we investigate a state-of-the-art paragraph embedding method…

Computation and Language · Computer Science 2019-06-11 Tu Vu , Mohit Iyyer

Semantic Regularities in Document Representations

Recent work exhibited that distributed word representations are good at capturing linguistic regularities in language. This allows vector-oriented reasoning based on simple linear algebra between words. Since many different methods have…

Computation and Language · Computer Science 2016-03-25 Fei Sun , Jiafeng Guo , Yanyan Lan , Jun Xu , Xueqi Cheng

Can Network Embedding of Distributional Thesaurus be Combined with Word Vectors for Better Representation?

Distributed representations of words learned from text have proved to be successful in various natural language processing tasks in recent times. While some methods represent words as vectors computed from text using predictive model…

Computation and Language · Computer Science 2018-02-20 Abhik Jana , Pawan Goyal

Making Fast Graph-based Algorithms with Graph Metric Embeddings

The computation of distance measures between nodes in graphs is inefficient and does not scale to large graphs. We explore dense vector representations as an effective way to approximate the same information: we introduce a simple yet…

Computation and Language · Computer Science 2019-06-18 Andrey Kutuzov , Mohammad Dorgham , Oleksiy Oliynyk , Chris Biemann , Alexander Panchenko

Network Vector: Distributed Representations of Networks with Global Context

We propose a neural embedding algorithm called Network Vector, which learns distributed representations of nodes and the entire networks simultaneously. By embedding networks in a low-dimensional space, the algorithm allows us to compare…

Social and Information Networks · Computer Science 2017-09-11 Hao Wu , Kristina Lerman

Category Enhanced Word Embedding

Distributed word representations have been demonstrated to be effective in capturing semantic and syntactic regularities. Unsupervised representation learning from large unlabeled corpora can learn similar representations for those words…

Computation and Language · Computer Science 2015-12-01 Chunting Zhou , Chonglin Sun , Zhiyuan Liu , Francis C. M. Lau

Empirical Evaluation of Embedding Models in the Context of Text Classification in Document Review in Construction Delay Disputes

Text embeddings are numerical representations of text data, where words, phrases, or entire documents are converted into vectors of real numbers. These embeddings capture semantic meanings and relationships between text elements in a…

Information Retrieval · Computer Science 2025-01-20 Fusheng Wei , Robert Neary , Han Qin , Qiang Mao , Jianping Zhang

Towards a Theoretical Understanding of Word and Relation Representation

Representing words by vectors, or embeddings, enables computational reasoning and is foundational to automating natural language tasks. For example, if word embeddings of similar words contain similar values, word similarity can be readily…

Computation and Language · Computer Science 2022-02-02 Carl Allen

Learning to Distill: The Essence Vector Modeling Framework

In the context of natural language processing, representation learning has emerged as a newly active research subject because of its excellent performance in many applications. Learning representations of words is a pioneering study in this…

Computation and Language · Computer Science 2016-11-23 Kuan-Yu Chen , Shih-Hung Liu , Berlin Chen , Hsin-Min Wang

Semantic Sentence Embeddings for Paraphrasing and Text Summarization

This paper introduces a sentence to vector encoding framework suitable for advanced natural language processing. Our latent representation is shown to encode sentences with common semantic information with similar vector representations.…

Computation and Language · Computer Science 2018-09-30 Chi Zhang , Shagan Sah , Thang Nguyen , Dheeraj Peri , Alexander Loui , Carl Salvaggio , Raymond Ptucha

Exploring Sentence Vector Spaces through Automatic Summarization

Given vector representations for individual words, it is necessary to compute vector representations of sentences for many applications in a compositional manner, often using artificial neural networks. Relatively little work has explored…

Computation and Language · Computer Science 2018-10-18 Adly Templeton , Jugal Kalita