English
Related papers

Related papers: KeyVec: Key-semantics Preserving Document Represen…

200 papers

Latent Dirichlet Allocation (LDA) mining thematic structure of documents plays an important role in nature language processing and machine learning areas. However, the probability distribution from LDA only describes the statistical…

Computation and Language · Computer Science 2015-06-30 Li-Qiang Niu , Xin-Yu Dai

Topic modeling is used for discovering latent semantic structure, usually referred to as topics, in a large collection of documents. The most widely used methods are Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis.…

Computation and Language · Computer Science 2020-08-24 Dimo Angelov

Recent work exhibited that distributed word representations are good at capturing linguistic regularities in language. This allows vector-oriented reasoning based on simple linear algebra between words. Since many different methods have…

Computation and Language · Computer Science 2016-03-25 Fei Sun , Jiafeng Guo , Yanyan Lan , Jun Xu , Xueqi Cheng

Representing documents into high dimensional embedding space while preserving the structural similarity between document sources has been an ultimate goal for many works on text representation learning. Current embedding models, however,…

Computation and Language · Computer Science 2023-10-31 Iftitahu Ni'mah , Samaneh Khoshrou , Vlado Menkovski , Mykola Pechenizkiy

Vector representation of sentences is important for many text processing tasks that involve clustering, classifying, or ranking sentences. Recently, distributed representation of sentences learned by neural models from unlabeled data has…

Computation and Language · Computer Science 2016-10-27 Tanay Kumar Saha , Shafiq Joty , Naeemul Hassan , Mohammad Al Hasan

Distributed word representations have been demonstrated to be effective in capturing semantic and syntactic regularities. Unsupervised representation learning from large unlabeled corpora can learn similar representations for those words…

Computation and Language · Computer Science 2015-12-01 Chunting Zhou , Chonglin Sun , Zhiyuan Liu , Francis C. M. Lau

Representing words by vectors, or embeddings, enables computational reasoning and is foundational to automating natural language tasks. For example, if word embeddings of similar words contain similar values, word similarity can be readily…

Computation and Language · Computer Science 2022-02-02 Carl Allen

Distributed representations of words learned from text have proved to be successful in various natural language processing tasks in recent times. While some methods represent words as vectors computed from text using predictive model…

Computation and Language · Computer Science 2018-02-20 Abhik Jana , Pawan Goyal

Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on…

Information Retrieval · Computer Science 2016-10-27 Bhaskar Mitra , Fernando Diaz , Nick Craswell

We consider the problem of learning distributed representations for documents in data streams. The documents are represented as low-dimensional vectors and are jointly learned with distributed vector representations of word tokens using a…

Computation and Language · Computer Science 2016-06-29 Nemanja Djuric , Hao Wu , Vladan Radosavljevic , Mihajlo Grbovic , Narayan Bhamidipati

In this paper, we describe TextEnt, a neural network model that learns distributed representations of entities and documents directly from a knowledge base (KB). Given a document in a KB consisting of words and entity annotations, we train…

Computation and Language · Computer Science 2018-06-11 Ikuya Yamada , Hiroyuki Shindo , Yoshiyasu Takefuji

Data representation is a fundamental task in machine learning. The representation of data affects the performance of the whole machine learning system. In a long history, the representation of data is done by feature engineering, and…

Computation and Language · Computer Science 2016-11-21 Siwei Lai

Efficient representation of text documents is an important building block in many NLP tasks. Research on long text categorization has shown that simple weighted averaging of word vectors for sentence representation often outperforms more…

Computation and Language · Computer Science 2019-11-20 Vivek Gupta , Ankit Saw , Pegah Nokhiz , Harshit Gupta , Partha Talukdar

Due to the increasing amount of data on the internet, finding a highly-informative, low-dimensional representation for text is one of the main challenges for efficient natural language processing tasks including text classification. This…

Computation and Language · Computer Science 2020-06-02 Erfaneh Gharavi , Hadi Veisi

Large scale contextual representation models have significantly advanced NLP in recent years, understanding the semantics of text to a degree never seen before. However, they need to process large amounts of data to achieve high-quality…

Computation and Language · Computer Science 2021-05-04 Daniel Garcia Bernal , Lodovico Giaretta , Sarunas Girdzijauskas , Magnus Sahlgren

Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Processing and Information Retrieval. We introduce a model that is able to represent the…

Computation and Language · Computer Science 2014-06-17 Misha Denil , Alban Demiraj , Nal Kalchbrenner , Phil Blunsom , Nando de Freitas

Distributed dense word vectors have been shown to be effective at capturing token-level semantic and syntactic regularities in language, while topic models can form interpretable representations over documents. In this work, we describe…

Computation and Language · Computer Science 2016-05-09 Christopher E Moody

Learning a high-dimensional dense representation for vocabulary terms, also known as a word embedding, has recently attracted much attention in natural language processing and information retrieval tasks. The embedding vectors are typically…

Information Retrieval · Computer Science 2017-07-18 Hamed Zamani , W. Bruce Croft

Distributed representations of words and paragraphs as semantic embeddings in high dimensional data are used across a number of Natural Language Understanding tasks such as retrieval, translation, and classification. In this work, we…

Computation and Language · Computer Science 2015-08-04 Devendra Singh Sachan , Shailesh Kumar

Autoencoders have been successful in learning meaningful representations from image datasets. However, their performance on text datasets has not been widely studied. Traditional autoencoders tend to learn possibly trivial representations…

Machine Learning · Statistics 2017-06-06 Yu Chen , Mohammed J. Zaki
‹ Prev 1 2 3 10 Next ›