Related papers: Thematically Reinforced Explicit Semantic Analysis
[This is the translation of paper "Arborification de Wikip\'edia et analyse s\'emantique explicite stratifi\'ee" submitted to TALN 2012.] We present an extension of the Explicit Semantic Analysis method by Gabrilovich and Markovitch. Using…
Explicit Semantic Analysis (ESA) is a technique used to represent a piece of text as a vector in the space of concepts, such as Articles found in Wikipedia. We propose a methodology to incorporate knowledge of Inter-relatedness between…
Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of…
We describe a new semantic relatedness measure combining the Wikipedia-based Explicit Semantic Analysis measure, the WordNet path measure and the mixed collocation index. Our measure achieves the currently highest results on the WS-353…
This work compares concept models for cross-language retrieval: First, we adapt probabilistic Latent Semantic Analysis (pLSA) for multilingual documents. Experiments with different weighting schemes show that a weighting method favoring…
In this paper, we develop a decision support system for the hierarchical text classification. We consider text collections with a fixed hierarchical structure of topics given by experts in the form of a tree. The system sorts the topics by…
Keyphrase extraction from a given document is the task of automatically extracting salient phrases that best describe the document. This paper proposes a novel unsupervised graph-based ranking method to extract high-quality phrases from a…
In this paper we present our web application SeRE designed to explore semantically related concepts. Wikipedia and DBpedia are rich data sources to extract related entities for a given topic, like in- and out-links, broader and narrower…
In this letter, we present a novel exponentially embedded families (EEF) based classification method, in which the probability density function (PDF) on raw data is estimated from the PDF on features. With the PDF construction, we show that…
Keyword extraction has received an increasing attention as an important research topic which can lead to have advancements in diverse applications such as document context categorization, text indexing and document classification. In this…
This technical report outlines the filtering approach applied to the collection of the Bielefeld Academic Search Engine (BASE) data to extract articles from the political science domain. We combined hard and soft filters to address entries…
Many researchers have made use of the Wikipedia network for relatedness and similarity tasks. However, most approaches use only the most recent information and not historical changes in the network. We provide an analysis of entity…
This article analyses and evaluates FDD\b{eta}, a supervised term-weighting scheme that can be applied for query-term selection in topic-based retrieval. FDD\b{eta} weights terms based on two factors representing the descriptive and…
Graph-based extractive document summarization relies on the quality of the sentence similarity graph. Bag-of-words or tf-idf based sentence similarity uses exact word matching, but fails to measure the semantic similarity between individual…
The Semantic Web began to emerge as its standards and technologies developed rapidly in the recent years. The continuing development of Semantic Web technologies has facilitated publishing explicit semantics with data on the Web in RDF data…
This paper investigates the efficiency of the EWC semantic relatedness measure in an ad-hoc retrieval task. This measure combines the Wikipedia-based Explicit Semantic Analysis measure, the WordNet path measure and the mixed collocation…
Over the last few years, neural network derived word embeddings became popular in the natural language processing literature. Studies conducted have mostly focused on the quality and application of word embeddings trained on public…
Text-attributed graph (TAG) provides a text description for each graph node, and few- and zero-shot node classification on TAGs have many applications in fields such as academia and social networks. Existing work utilizes various…
Wikipedia is a great source of general world knowledge which can guide NLP models better understand their motivation to make predictions. Structuring Wikipedia is the initial step towards this goal which can facilitate fine-grain…
Online Community Question Answering (CQA) platforms have become indispensable tools for users seeking expert solutions to their technical queries. The effectiveness of these platforms relies on their ability to identify and direct questions…