Related papers: KeyVec: Key-semantics Preserving Document Represen…

Topic2Vec: Learning Distributed Representations of Topics

Latent Dirichlet Allocation (LDA) mining thematic structure of documents plays an important role in nature language processing and machine learning areas. However, the probability distribution from LDA only describes the statistical…

Computation and Language · Computer Science 2015-06-30 Li-Qiang Niu , Xin-Yu Dai

Top2Vec: Distributed Representations of Topics

Topic modeling is used for discovering latent semantic structure, usually referred to as topics, in a large collection of documents. The most widely used methods are Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis.…

Computation and Language · Computer Science 2020-08-24 Dimo Angelov

Semantic Regularities in Document Representations

Recent work exhibited that distributed word representations are good at capturing linguistic regularities in language. This allows vector-oriented reasoning based on simple linear algebra between words. Since many different methods have…

Computation and Language · Computer Science 2016-03-25 Fei Sun , Jiafeng Guo , Yanyan Lan , Jun Xu , Xueqi Cheng

KeyGen2Vec: Learning Document Embedding via Multi-label Keyword Generation in Question-Answering

Representing documents into high dimensional embedding space while preserving the structural similarity between document sources has been an ultimate goal for many works on text representation learning. Current embedding models, however,…

Computation and Language · Computer Science 2023-10-31 Iftitahu Ni'mah , Samaneh Khoshrou , Vlado Menkovski , Mykola Pechenizkiy

Dis-S2V: Discourse Informed Sen2Vec

Vector representation of sentences is important for many text processing tasks that involve clustering, classifying, or ranking sentences. Recently, distributed representation of sentences learned by neural models from unlabeled data has…

Computation and Language · Computer Science 2016-10-27 Tanay Kumar Saha , Shafiq Joty , Naeemul Hassan , Mohammad Al Hasan

Category Enhanced Word Embedding

Distributed word representations have been demonstrated to be effective in capturing semantic and syntactic regularities. Unsupervised representation learning from large unlabeled corpora can learn similar representations for those words…

Computation and Language · Computer Science 2015-12-01 Chunting Zhou , Chonglin Sun , Zhiyuan Liu , Francis C. M. Lau

Towards a Theoretical Understanding of Word and Relation Representation

Representing words by vectors, or embeddings, enables computational reasoning and is foundational to automating natural language tasks. For example, if word embeddings of similar words contain similar values, word similarity can be readily…

Computation and Language · Computer Science 2022-02-02 Carl Allen

Can Network Embedding of Distributional Thesaurus be Combined with Word Vectors for Better Representation?

Distributed representations of words learned from text have proved to be successful in various natural language processing tasks in recent times. While some methods represent words as vectors computed from text using predictive model…

Computation and Language · Computer Science 2018-02-20 Abhik Jana , Pawan Goyal

Learning to Match Using Local and Distributed Representations of Text for Web Search

Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on…

Information Retrieval · Computer Science 2016-10-27 Bhaskar Mitra , Fernando Diaz , Nick Craswell

Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content

We consider the problem of learning distributed representations for documents in data streams. The documents are represented as low-dimensional vectors and are jointly learned with distributed vector representations of word tokens using a…

Computation and Language · Computer Science 2016-06-29 Nemanja Djuric , Hao Wu , Vladan Radosavljevic , Mihajlo Grbovic , Narayan Bhamidipati

Representation Learning of Entities and Documents from Knowledge Base Descriptions

In this paper, we describe TextEnt, a neural network model that learns distributed representations of entities and documents directly from a knowledge base (KB). Given a document in a KB consisting of words and entity annotations, we train…

Computation and Language · Computer Science 2018-06-11 Ikuya Yamada , Hiroyuki Shindo , Yoshiyasu Takefuji

Word and Document Embeddings based on Neural Network Approaches

Data representation is a fundamental task in machine learning. The representation of data affects the performance of the whole machine learning system. In a long history, the representation of data is done by feature engineering, and…

Computation and Language · Computer Science 2016-11-21 Siwei Lai

Improving Document Classification with Multi-Sense Embeddings

Efficient representation of text documents is an important building block in many NLP tasks. Research on long text categorization has shown that simple weighted averaging of word vectors for sentence representation often outperforms more…

Computation and Language · Computer Science 2019-11-20 Vivek Gupta , Ankit Saw , Pegah Nokhiz , Harshit Gupta , Partha Talukdar

Improve Document Embedding for Text Categorization Through Deep Siamese Neural Network

Due to the increasing amount of data on the internet, finding a highly-informative, low-dimensional representation for text is one of the main challenges for efficient natural language processing tasks including text classification. This…

Computation and Language · Computer Science 2020-06-02 Erfaneh Gharavi , Hadi Veisi

Federated Word2Vec: Leveraging Federated Learning to Encourage Collaborative Representation Learning

Large scale contextual representation models have significantly advanced NLP in recent years, understanding the semantics of text to a degree never seen before. However, they need to process large amounts of data to achieve high-quality…

Computation and Language · Computer Science 2021-05-04 Daniel Garcia Bernal , Lodovico Giaretta , Sarunas Girdzijauskas , Magnus Sahlgren

Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network

Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Processing and Information Retrieval. We introduce a model that is able to represent the…

Computation and Language · Computer Science 2014-06-17 Misha Denil , Alban Demiraj , Nal Kalchbrenner , Phil Blunsom , Nando de Freitas

Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec

Distributed dense word vectors have been shown to be effective at capturing token-level semantic and syntactic regularities in language, while topic models can form interpretable representations over documents. In this work, we describe…

Computation and Language · Computer Science 2016-05-09 Christopher E Moody

Relevance-based Word Embedding

Learning a high-dimensional dense representation for vocabulary terms, also known as a word embedding, has recently attracted much attention in natural language processing and information retrieval tasks. The embedding vectors are typically…

Information Retrieval · Computer Science 2017-07-18 Hamed Zamani , W. Bruce Croft

Class Vectors: Embedding representation of Document Classes

Distributed representations of words and paragraphs as semantic embeddings in high dimensional data are used across a number of Natural Language Understanding tasks such as retrieval, translation, and classification. In this work, we…

Computation and Language · Computer Science 2015-08-04 Devendra Singh Sachan , Shailesh Kumar

KATE: K-Competitive Autoencoder for Text

Autoencoders have been successful in learning meaningful representations from image datasets. However, their performance on text datasets has not been widely studied. Traditional autoencoders tend to learn possibly trivial representations…

Machine Learning · Statistics 2017-06-06 Yu Chen , Mohammed J. Zaki