Related papers: Exploring text datasets by visualizing relevant wo…

Discovering topics in text datasets by visualizing relevant words

When dealing with large collections of documents, it is imperative to quickly get an overview of the texts' contents. In this paper we show how this can be achieved by using a clustering algorithm to identify topics in the dataset and then…

Computation and Language · Computer Science 2017-07-20 Franziska Horn , Leila Arras , Grégoire Montavon , Klaus-Robert Müller , Wojciech Samek

Document Visualization using Topic Clouds

Traditionally a document is visualized by a word cloud. Recently, distributed representation methods for documents have been developed, which map a document to a set of topic embeddings. Visualizing such a representation is useful to…

Information Retrieval · Computer Science 2017-02-07 Shaohua Li , Tat-Seng Chua

Using word clouds for fast identification of papers' subject domain and reviewers' competences

Generating word (tag) clouds is a powerful data visualization technique that allows people to get easily acquainted with the content of a large collection of textual documents and identify their subject domains for a matter of seconds,…

Information Retrieval · Computer Science 2022-01-03 Yordan Kalmukov

Decoding the Text Encoding

Word clouds and text visualization is one of the recent most popular and widely used types of visualizations. Despite the attractiveness and simplicity of producing word clouds, they do not provide a thorough visualization for the…

Human-Computer Interaction · Computer Science 2014-12-19 Fereshteh Sadeghi , Hamid Izadinia

Navigating multilingual news collections using automatically extracted information

We are presenting a text analysis tool set that allows analysts in various fields to sieve through large collections of multilingual news items quickly and to find information that is of relevance to them. For a given document collection,…

Computation and Language · Computer Science 2007-05-23 Ralf Steinberger , Bruno Pouliquen , Camelia Ignat

Document Retrieval for Large Scale Content Analysis using Contextualized Dictionaries

This paper presents a procedure to retrieve subsets of relevant documents from large text collections for Content Analysis, e.g. in social sciences. Document retrieval for this purpose needs to take account of the fact that analysts often…

Information Retrieval · Computer Science 2017-07-12 Gregor Wiedemann , Andreas Niekler

A Novel Approach to Document Classification using WordNet

Content based Document Classification is one of the biggest challenges in the context of free text mining. Current algorithms on document classifications mostly rely on cluster analysis based on bag-of-words approach. However that method is…

Information Retrieval · Computer Science 2015-12-15 Koushiki Sarkar , Ritwika Law

Extraction of Salient Sentences from Labelled Documents

We present a hierarchical convolutional document model with an architecture designed to support introspection of the document structure. Using this model, we show how to use visualisation techniques from the computer vision literature to…

Computation and Language · Computer Science 2015-03-03 Misha Denil , Alban Demiraj , Nando de Freitas

RaKUn: Rank-based Keyword extraction via Unsupervised learning and Meta vertex aggregation

Keyword extraction is used for summarizing the content of a document and supports efficient document retrieval, and is as such an indispensable part of modern text-based systems. We explore how load centrality, a graph-theoretic measure…

Computation and Language · Computer Science 2019-11-12 Blaž Škrlj , Andraž Repar , Senja Pollak

Graph-based Semantical Extractive Text Analysis

In the past few decades, there has been an explosion in the amount of available data produced from various sources with different topics. The availability of this enormous data necessitates us to adopt effective computational tools to…

Computation and Language · Computer Science 2022-12-20 Mina Samizadeh

Word Storms: Multiples of Word Clouds for Visual Comparison of Documents

Word clouds are a popular tool for visualizing documents, but they are not a good tool for comparing documents, because identical words are not presented consistently across different clouds. We introduce the concept of word storms, a…

Information Retrieval · Computer Science 2013-01-04 Quim Castella , Charles Sutton

Replication of the Keyword Extraction part of the paper "'Without the Clutter of Unimportant Words': Descriptive Keyphrases for Text Visualization"

"Keyword Extraction" refers to the task of automatically identifying the most relevant and informative phrases in natural language text. As we are deluged with large amounts of text data in many different forms and content - emails, blogs,…

Computation and Language · Computer Science 2019-08-22 Shibamouli Lahiri

Information Retrieval in long documents: Word clustering approach for improving Semantics

In this paper, we propose an alternative to deep neural networks for semantic information retrieval for the case of long documents. This new approach exploiting clustering techniques to take into account the meaning of words in Information…

Information Retrieval · Computer Science 2025-07-29 Paul Mbathe Mekontchou , Armel Fotsoh , Bernabe Batchakui , Eddy Ella

Worbel: Aggregating Point Labels into Word Clouds

Point feature labeling is a classical problem in cartography and GIS that has been extensively studied for geospatial point data. At the same time, word clouds are a popular visualization tool to show the most important words in text data…

Computational Geometry · Computer Science 2021-09-10 Sujoy Bhore , Robert Ganian , Guangping Li , Martin Nöllenburg , Jules Wulms

From Image to Text Classification: A Novel Approach based on Clustering Word Embeddings

In this paper, we propose a novel approach for text classification based on clustering word embeddings, inspired by the bag of visual words model, which is widely used in computer vision. After each word in a collection of documents is…

Computation and Language · Computer Science 2017-07-26 Andrei M. Butnaru , Radu Tudor Ionescu

Keywords lie far from the mean of all words in local vector space

Keyword extraction is an important document process that aims at finding a small set of terms that concisely describe a document's topics. The most popular state-of-the-art unsupervised approaches belong to the family of the graph-based…

Computation and Language · Computer Science 2020-08-24 Eirini Papagiannopoulou , Grigorios Tsoumakas , Apostolos N. Papadopoulos

A Semantic approach for effective document clustering using WordNet

Now a days, the text document is spontaneously increasing over the internet, e-mail and web pages and they are stored in the electronic database format. To arrange and browse the document it becomes difficult. To overcome such problem the…

Computation and Language · Computer Science 2013-03-05 Leena H. Patil , Mohammed Atique

A tool set for the quick and efficient exploration of large document collections

We are presenting a set of multilingual text analysis tools that can help analysts in any field to explore large document collections quickly in order to determine whether the documents contain information of interest, and to find the…

Computation and Language · Computer Science 2007-05-23 Camelia Ignat , Bruno Pouliquen , Ralf Steinberger , Tomaz Erjavec

Using Genetic Algorithms for Texts Classification Problems

The avalanche quantity of the information developed by mankind has led to concept of automation of knowledge extraction - Data Mining ([1]). This direction is connected with a wide spectrum of problems - from recognition of the fuzzy set to…

Machine Learning · Computer Science 2009-06-05 A. A. Shumeyko , S. L. Sotnik

Document Clustering based on Topic Maps

Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next…

Information Retrieval · Computer Science 2011-12-30 Muhammad Rafi , M. Shahid Shaikh , Amir Farooq