Related papers: Coherence-Based Distributed Document Representatio…

Paper2vec: Citation-Context Based Document Distributed Representation for Scholar Recommendation

Due to the availability of references of research papers and the rich information contained in papers, various citation analysis approaches have been proposed to identify similar documents for scholar recommendation. Despite of the success…

Information Retrieval · Computer Science 2017-03-21 Han Tian , Hankz Hankui Zhuo

Graph Embedding for Mapping Interdisciplinary Research Networks

Representation learning is the first step in automating tasks such as research paper recommendation, classification, and retrieval. Due to the accelerating rate of research publication, together with the recognised benefits of…

Digital Libraries · Computer Science 2023-03-22 Eoghan Cunningham , Derek Greene

hyperdoc2vec: Distributed Representations of Hypertext Documents

Hypertext documents, such as web pages and academic papers, are of great importance in delivering information in our daily life. Although being effective on plain documents, conventional text embedding methods suffer from information loss…

Computation and Language · Computer Science 2018-05-11 Jialong Han , Yan Song , Wayne Xin Zhao , Shuming Shi , Haisong Zhang

KeyVec: Key-semantics Preserving Document Representations

Previous studies have demonstrated the empirical success of word embeddings in various applications. In this paper, we investigate the problem of learning distributed representations for text documents which many machine learning algorithms…

Computation and Language · Computer Science 2017-09-29 Bin Bi , Hao Ma

Semantic Regularities in Document Representations

Recent work exhibited that distributed word representations are good at capturing linguistic regularities in language. This allows vector-oriented reasoning based on simple linear algebra between words. Since many different methods have…

Computation and Language · Computer Science 2016-03-25 Fei Sun , Jiafeng Guo , Yanyan Lan , Jun Xu , Xueqi Cheng

Learning to Match Using Local and Distributed Representations of Text for Web Search

Models such as latent semantic analysis and those based on neural embeddings learn distributed representations of text, and match the query against the document in the latent semantic space. In traditional information retrieval models, on…

Information Retrieval · Computer Science 2016-10-27 Bhaskar Mitra , Fernando Diaz , Nick Craswell

Exploiting the Bipartite Structure of Entity Grids for Document Coherence and Retrieval

Document coherence describes how much sense text makes in terms of its logical organisation and discourse flow. Even though coherence is a relatively difficult notion to quantify precisely, it can be approximated automatically. This type of…

Information Retrieval · Computer Science 2016-08-03 Christina Lioma , Fabien Tarissan , Jakob Grue Simonsen , Casper Petersen , Birger Larsen

Multilingual Distributed Representations without Word Alignment

Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP. By overcoming data sparsity problems, as well as providing information about semantic relatedness which is not…

Computation and Language · Computer Science 2014-03-21 Karl Moritz Hermann , Phil Blunsom

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

Word embedding maps words into a low-dimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by…

Computation and Language · Computer Science 2016-08-09 Shaohua Li , Tat-Seng Chua , Jun Zhu , Chunyan Miao

Specialized Document Embeddings for Aspect-based Similarity of Research Papers

Document embeddings and similarity measures underpin content-based recommender systems, whereby a document is commonly represented as a single generic embedding. However, similarity computed on single vector representations provides only…

Information Retrieval · Computer Science 2022-03-29 Malte Ostendorff , Till Blume , Terry Ruas , Bela Gipp , Georg Rehm

Leveraging Word Embeddings for Spoken Document Summarization

Owing to the rapidly growing multimedia content available on the Internet, extractive spoken document summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document to concisely express…

Computation and Language · Computer Science 2015-06-16 Kuan-Yu Chen , Shih-Hung Liu , Hsin-Min Wang , Berlin Chen , Hsin-Hsi Chen

Effective Distributed Representations for Academic Expert Search

Expert search aims to find and rank experts based on a user's query. In academia, retrieving experts is an efficient way to navigate through a large amount of academic knowledge. Here, we study how different distributed representations of…

Information Retrieval · Computer Science 2022-11-10 Mark Berger , Jakub Zavrel , Paul Groth

Document Network Projection in Pretrained Word Embedding Space

We present Regularized Linear Embedding (RLE), a novel method that projects a collection of linked documents (e.g. citation network) into a pretrained word embedding space. In addition to the textual content, we leverage a matrix of…

Information Retrieval · Computer Science 2020-01-17 Antoine Gourru , Adrien Guille , Julien Velcin , Julien Jacques

Modeling Structural Similarities between Documents for Coherence Assessment with Graph Convolutional Networks

Coherence is an important aspect of text quality, and various approaches have been applied to coherence modeling. However, existing methods solely focus on a single document's coherence patterns, ignoring the underlying correlation between…

Computation and Language · Computer Science 2023-06-13 Wei Liu , Xiyan Fu , Michael Strube

Bilingual Distributed Word Representations from Document-Aligned Comparable Data

We propose a new model for learning bilingual word representations from non-parallel document-aligned data. Following the recent advances in word representation learning, our model learns dense real-valued word vectors, that is, bilingual…

Computation and Language · Computer Science 2016-03-01 Ivan Vulić , Marie-Francine Moens

Topic Segmentation Model Focusing on Local Context

Topic segmentation is important in understanding scientific documents since it can not only provide better readability but also facilitate downstream tasks such as information retrieval and question answering by creating appropriate…

Computation and Language · Computer Science 2023-01-06 Jeonghwan Lee , Jiyeong Han , Sunghoon Baek , Min Song

Constructing Datasets for Multi-hop Reading Comprehension Across Documents

Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine…

Computation and Language · Computer Science 2018-06-12 Johannes Welbl , Pontus Stenetorp , Sebastian Riedel

Distraction-Based Neural Networks for Document Summarization

Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences. Whether and how such an approach can be extended to…

Computation and Language · Computer Science 2016-10-27 Qian Chen , Xiaodan Zhu , Zhenhua Ling , Si Wei , Hui Jiang

Category Enhanced Word Embedding

Distributed word representations have been demonstrated to be effective in capturing semantic and syntactic regularities. Unsupervised representation learning from large unlabeled corpora can learn similar representations for those words…

Computation and Language · Computer Science 2015-12-01 Chunting Zhou , Chonglin Sun , Zhiyuan Liu , Francis C. M. Lau

Multilingual Models for Compositional Distributed Semantics

We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings. Our models leverage parallel data and learn to strongly align the embeddings of…

Computation and Language · Computer Science 2014-04-21 Karl Moritz Hermann , Phil Blunsom