Related papers: Generalized Word Shift Graphs: A Method for Visual…
Recently, researchers started to pay attention to the detection of temporal shifts in the meaning of words. However, most (if not all) of these approaches restricted their efforts to uncovering change over time, thus neglecting other…
Neural network based models are a very powerful tool for creating word embeddings, the objective of these models is to group similar words together. These embeddings have been used as features to improve results in various applications such…
We show how generalized Gibbs-Shannon entropies can provide new insights on the statistical properties of texts. The universal distribution of word frequencies (Zipf's law) implies that the generalized entropies, computed at the word level,…
A common practice in Natural Language Processing (NLP) is to visualize the text corpus without reading through the entire literature, still grasping the central idea and key points described. For a long time, researchers focused on…
We describe a new method for visualizing topics, the distributions over terms that are automatically extracted from large text corpora using latent variable models. Our method finds significant $n$-grams related to a topic, which are then…
We consider the testing and estimation of change-points -- locations where the distribution abruptly changes -- in a data sequence. A new approach, based on scan statistics utilizing graphs representing the similarity between observations,…
With the availability of virtually infinite number text documents in digital format, automatic comparison of textual data is essential for extracting meaningful insights that are difficult to identify manually. Many existing tools,…
We propose a new approach to text semantic analysis and general corpus analysis using, as termed in this article, a "bi-gram graph" representation of a corpus. The different attributes derived from graph theory are measured and analyzed as…
The emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, bearing profound implications for our understanding of human behavior. Given the growing assortment of sentiment…
Distributed representations of words as real-valued vectors in a relatively low-dimensional space aim at extracting syntactic and semantic features from large text corpora. A recently introduced neural network, named word2vec (Mikolov et…
We present a method of finding and analyzing shifts in grammatical relations found in diachronic corpora. Inspired by the econometric technique of measuring return and volatility instead of relative frequencies, we propose them as a way to…
Over the last few years, machine learning over graph structures has manifested a significant enhancement in text mining applications such as event detection, opinion mining, and news recommendation. One of the primary challenges in this…
Data Visualization has become an important aspect of big data analytics and has grown in sophistication and variety. We specifically identify the need for an analytical framework for data visualization with textual information. Data…
The performance of machine learning models relies heavily on the quality of input data, yet real-world applications often face significant data-related challenges. A common issue arises when curating training data or deploying models: two…
Temporal information has been the focus of recent attention in information extraction, leading to some standardization effort, in particular for the task of relating events in a text. This task raises the problem of comparing two…
In this paper, we propose methods for discovering semantic differences in words appearing in two corpora based on the norms of contextualized word vectors. The key idea is that the coverage of meanings is reflected in the norm of its mean…
In light of the recent success of Graph Neural Networks (GNNs) and their ability to perform inference on complex data structures, many studies apply GNNs to the task of text classification. In most previous methods, a heterogeneous graph,…
The problem of comparing two bodies of text and searching for words that differ in their usage between them arises often in digital humanities and computational social science. This is commonly approached by training word embeddings on each…
In the digital landscape, the ubiquity of data visualizations in media underscores the necessity for accessibility to ensure inclusivity for all users, including those with visual impairments. Current visual content often fails to cater to…
We describe a new method for summarizing similarities and differences in a pair of related documents using a graph representation for text. Concepts denoted by words, phrases, and proper names in the document are represented positionally as…