Related papers: Thematically Reinforced Explicit Semantic Analysis

Wikipedia Arborification and Stratified Explicit Semantic Analysis

[This is the translation of paper "Arborification de Wikip\'edia et analyse s\'emantique explicite stratifi\'ee" submitted to TALN 2012.] We present an extension of the Explicit Semantic Analysis method by Gabrilovich and Markovitch. Using…

Computation and Language · Computer Science 2012-02-03 Yannis Haralambous , Vitaly Klyuev

Introducing Inter-Relatedness between Wikipedia Articles in Explicit Semantic Analysis

Explicit Semantic Analysis (ESA) is a technique used to represent a piece of text as a vector in the space of concepts, such as Articles found in Wikipedia. We propose a methodology to incorporate knowledge of Inter-relatedness between…

Computation and Language · Computer Science 2020-12-02 Naveen Elango , Pawan Prasad K

Wikipedia-based Semantic Interpretation for Natural Language Processing

Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of…

Computation and Language · Computer Science 2014-01-23 Evgeniy Gabrilovich , Shaul Markovitch

We describe a new semantic relatedness measure combining the Wikipedia-based Explicit Semantic Analysis measure, the WordNet path measure and the mixed collocation index. Our measure achieves the currently highest results on the WS-353…

Computation and Language · Computer Science 2011-08-23 Yannis Haralambous , Vitaly Klyuev

Assessing Wikipedia-Based Cross-Language Retrieval Models

This work compares concept models for cross-language retrieval: First, we adapt probabilistic Latent Semantic Analysis (pLSA) for multilingual documents. Experiments with different weighting schemes show that a weighting method favoring…

Information Retrieval · Computer Science 2014-01-13 Benjamin Roth

Hierarchical thematic classification of major conference proceedings

In this paper, we develop a decision support system for the hierarchical text classification. We consider text collections with a fixed hierarchical structure of topics given by experts in the form of a tree. The system sorts the topics by…

Machine Learning · Computer Science 2024-06-24 Arsentii Kuzmin , Alexander Aduenko , Vadim Strijov

Topic Aware Contextualized Embeddings for High Quality Phrase Extraction

Keyphrase extraction from a given document is the task of automatically extracting salient phrases that best describe the document. This paper proposes a novel unsupervised graph-based ranking method to extract high-quality phrases from a…

Information Retrieval · Computer Science 2022-01-27 Venktesh V , Mukesh Mohania , Vikram Goyal

Exploring semantically-related concepts from Wikipedia: the case of SeRE

In this paper we present our web application SeRE designed to explore semantically related concepts. Wikipedia and DBpedia are rich data sources to extract related entities for a given topic, like in- and out-links, broader and narrower…

Computation and Language · Computer Science 2015-04-28 Daniel Hienert , Dennis Wegener , Siegfried Schomisch

EEF: Exponentially Embedded Families with Class-Specific Features for Classification

In this letter, we present a novel exponentially embedded families (EEF) based classification method, in which the probability density function (PDF) on raw data is estimated from the PDF on features. With the PDF construction, we show that…

Machine Learning · Statistics 2016-08-24 Bo Tang , Steven Kay , Haibo He , Paul M. Baggenstoss

Semantic Sensitive TF-IDF to Determine Word Relevance in Documents

Keyword extraction has received an increasing attention as an important research topic which can lead to have advancements in diverse applications such as document context categorization, text indexing and document classification. In this…

Information Retrieval · Computer Science 2021-01-27 Amir Jalilifard , Vinicius F. Caridá , Alex F. Mansano , Rogers S. Cristo , Felipe Penhorate C. da Fonseca

Automatically detecting scientific political science texts from a large general document index

This technical report outlines the filtering approach applied to the collection of the Bielefeld Academic Search Engine (BASE) data to extract articles from the political science domain. We combined hard and soft filters to address entries…

Digital Libraries · Computer Science 2024-06-25 Nina Smirnova

Temporal Analysis of Entity Relatedness and its Evolution using Wikipedia and DBpedia

Many researchers have made use of the Wikipedia network for relatedness and similarity tasks. However, most approaches use only the most recent information and not historical changes in the network. We provide an analysis of entity…

Computation and Language · Computer Science 2018-12-13 Narumol Prangnawarat , John P. McCrae , Conor Hayes

Assessing the behavior and performance of a supervised term-weighting technique for topic-based retrieval

This article analyses and evaluates FDD\b{eta}, a supervised term-weighting scheme that can be applied for query-term selection in topic-based retrieval. FDD\b{eta} weights terms based on two factors representing the descriptive and…

Information Retrieval · Computer Science 2020-07-20 Mariano Maisonnave , Fernando Delbianco , Fernando Tohmé , Ana Maguitman

Combining Word Embeddings and N-grams for Unsupervised Document Summarization

Graph-based extractive document summarization relies on the quality of the sentence similarity graph. Bag-of-words or tf-idf based sentence similarity uses exact word matching, but fails to measure the semantic similarity between individual…

Computation and Language · Computer Science 2020-04-30 Zhuolin Jiang , Manaj Srivastava , Sanjay Krishna , David Akodes , Richard Schwartz

Using RDF Summary Graph For Keyword-based Semantic Searches

The Semantic Web began to emerge as its standards and technologies developed rapidly in the recent years. The continuing development of Semantic Web technologies has facilitated publishing explicit semantics with data on the Web in RDF data…

Artificial Intelligence · Computer Science 2017-07-13 Serkan Ayvaz , Mehmet Aydar

Query Expansion: Term Selection using the EWC Semantic Relatedness Measure

This paper investigates the efficiency of the EWC semantic relatedness measure in an ad-hoc retrieval task. This measure combines the Wikipedia-based Explicit Semantic Analysis measure, the WordNet path measure and the mixed collocation…

Computation and Language · Computer Science 2011-08-23 Vitaly Klyuev , Yannis Haralambous

Document Embedding for Scientific Articles: Efficacy of Word Embeddings vs TFIDF

Over the last few years, neural network derived word embeddings became popular in the natural language processing literature. Studies conducted have mostly focused on the quality and application of word embeddings trained on public…

Artificial Intelligence · Computer Science 2021-07-13 H. J. Meijer , J. Truong , R. Karimi

Exploiting Text Semantics for Few and Zero Shot Node Classification on Text-attributed Graph

Text-attributed graph (TAG) provides a text description for each graph node, and few- and zero-shot node classification on TAGs have many applications in fields such as academia and social networks. Existing work utilizes various…

Computation and Language · Computer Science 2025-05-14 Yuxiang Wang , Xiao Yan , Shiyu Jin , Quanqing Xu , Chuang Hu , Yuanyuan Zhu , Bo Du , Jia Wu , Jiawei Jiang

Multi-class Multilingual Classification of Wikipedia Articles Using Extended Named Entity Tag Set

Wikipedia is a great source of general world knowledge which can guide NLP models better understand their motivation to make predictions. Structuring Wikipedia is the initial step towards this goal which can facilitate fine-grain…

Computation and Language · Computer Science 2020-03-09 Hassan S. Shavarani , Satoshi Sekine

Leveraging Topic Specificity and Social Relationships for Expert Finding in Community Question Answering Platforms

Online Community Question Answering (CQA) platforms have become indispensable tools for users seeking expert solutions to their technical queries. The effectiveness of these platforms relies on their ability to identify and direct questions…

Information Retrieval · Computer Science 2024-07-08 Maddalena Amendola , Andrea Passarella , Raffaele Perego