English
Related papers

Related papers: hyperdoc2vec: Distributed Representations of Hyper…

200 papers

Recently, Le and Mikolov (2014) proposed doc2vec as an extension to word2vec (Mikolov et al., 2013a) to learn document-level embeddings. Despite promising results in the original paper, others have struggled to reproduce those results. This…

Computation and Language · Computer Science 2016-12-19 Jey Han Lau , Timothy Baldwin

Tagging news articles or blog posts with relevant tags from a collection of predefined ones is coined as document tagging in this work. Accurate tagging of articles can benefit several downstream applications such as recommendation and…

Computation and Language · Computer Science 2017-07-18 Sheng Chen , Akshay Soni , Aasish Pappu , Yashar Mehdad

Previous studies have demonstrated the empirical success of word embeddings in various applications. In this paper, we investigate the problem of learning distributed representations for text documents which many machine learning algorithms…

Computation and Language · Computer Science 2017-09-29 Bin Bi , Hao Ma

The number of academic papers being published is increasing exponentially in recent years, and recommending adequate citations to assist researchers in writing papers is a non-trivial task. Conventional approaches may not be optimal, as the…

Information Retrieval · Computer Science 2020-01-09 Yang Zhang , Qiang Ma

We present an efficient document representation learning framework, Document Vector through Corruption (Doc2VecC). Doc2VecC represents each document as a simple average of word embeddings. It ensures a representation generated as such…

Computation and Language · Computer Science 2017-07-11 Minmin Chen

Distributed document representation is one of the basic problems in natural language processing. Currently distributed document representation methods mainly consider the context information of words or sentences. These methods do not take…

Computation and Language · Computer Science 2022-01-11 Shicheng Tan , Shu Zhao , Yanping Zhang

Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts. In their work, the authors showed that the method can learn an embedding of movie review texts which can be…

Computation and Language · Computer Science 2015-07-30 Andrew M. Dai , Christopher Olah , Quoc V. Le

Conventional text classification models make a bag-of-words assumption reducing text into word occurrence counts per document. Recent algorithms such as word2vec are capable of learning semantic meaning and similarity between words in an…

Computation and Language · Computer Science 2018-07-11 Vincent Major , Alisa Surkis , Yindalon Aphinyanaphongs

Nowadays, search engine users commonly rely on query suggestions to improve their initial inputs. Current systems are very good at recommending lexical adaptations or spelling corrections to users' queries. However, they often struggle to…

Information Retrieval · Computer Science 2023-01-24 Jorge Gabín , M. Eduardo Ares , Javier Parapar

Due to the availability of references of research papers and the rich information contained in papers, various citation analysis approaches have been proposed to identify similar documents for scholar recommendation. Despite of the success…

Information Retrieval · Computer Science 2017-03-21 Han Tian , Hankz Hankui Zhuo

Many recent document embedding models are trained on document-as-image representations, embedding rendered pages as images rather than the underlying source. Meanwhile, existing benchmarks for scientific document retrieval, such as ArXivQA…

Information Retrieval · Computer Science 2026-04-21 Ghazal Khalighinejad , Raghuveer Thirukovalluru , Alexander H. Oh , Bhuwan Dhingra

The continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document…

Expert search aims to find and rank experts based on a user's query. In academia, retrieving experts is an efficient way to navigate through a large amount of academic knowledge. Here, we study how different distributed representations of…

Information Retrieval · Computer Science 2022-11-10 Mark Berger , Jakub Zavrel , Paul Groth

Topic modeling is used for discovering latent semantic structure, usually referred to as topics, in a large collection of documents. The most widely used methods are Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis.…

Computation and Language · Computer Science 2020-08-24 Dimo Angelov

Vector representations of graphs and relational structures, whether hand-crafted feature vectors or learned representations, enable us to apply standard data analysis and machine learning techniques to the structures. A wide range of…

Machine Learning · Computer Science 2020-03-31 Martin Grohe

In this paper we perform a comparative analysis of three models for feature representation of text documents in the context of document classification. In particular, we consider the most often used family of models bag-of-words, recently…

Computation and Language · Computer Science 2017-07-06 Sanda Martinčić-Ipšić , Tanja Miličić , Ljupčo Todorovski

A fundamental goal of search engines is to identify, given a query, documents that have relevant text. This is intrinsically difficult because the query and the document may use different vocabulary, or the document may contain query words…

Information Retrieval · Computer Science 2016-02-04 Bhaskar Mitra , Eric Nalisnick , Nick Craswell , Rich Caruana

Latent semantic representations of words or paragraphs, namely the embeddings, have been widely applied to information retrieval (IR). One of the common approaches of utilizing embeddings for IR is to estimate the document-to-query (D2Q)…

Information Retrieval · Computer Science 2017-08-11 Chenhao Yang , Ben He , Yanhua Ran

Multimodal embedding models have been crucial in enabling various downstream tasks such as semantic similarity, information retrieval, and clustering over different modalities. However, existing multimodal embeddings like VLM2Vec, E5-V, GME…

Computer Vision and Pattern Recognition · Computer Science 2025-07-08 Rui Meng , Ziyan Jiang , Ye Liu , Mingyi Su , Xinyi Yang , Yuepeng Fu , Can Qin , Zeyuan Chen , Ran Xu , Caiming Xiong , Yingbo Zhou , Wenhu Chen , Semih Yavuz

We propose in this paper a new, hybrid document embedding approach in order to address the problem of document similarities with respect to the technical content. To do so, we employ a state-of-the-art graph techniques to first extract the…

Computation and Language · Computer Science 2019-07-02 Hamid Mirisaee , Eric Gaussier , Cedric Lagnier , Agnes Guerraz
‹ Prev 1 2 3 10 Next ›