Related papers: Generalized Word Shift Graphs: A Method for Visual…

Words are Malleable: Computing Semantic Shifts in Political and Media Discourse

Recently, researchers started to pay attention to the detection of temporal shifts in the meaning of words. However, most (if not all) of these approaches restricted their efforts to uncovering change over time, thus neglecting other…

Computation and Language · Computer Science 2017-11-16 Hosein Azarbonyad , Mostafa Dehghani , Kaspar Beelen , Alexandra Arkut , Maarten Marx , Jaap Kamps

Visualizing Linguistic Shift

Neural network based models are a very powerful tool for creating word embeddings, the objective of these models is to group similar words together. These embeddings have been used as features to improve results in various applications such…

Computation and Language · Computer Science 2016-11-27 Salman Mahmood , Rami Al-Rfou , Klaus Mueller

Generalized Entropies and the Similarity of Texts

We show how generalized Gibbs-Shannon entropies can provide new insights on the statistical properties of texts. The universal distribution of word frequencies (Zipf's law) implies that the generalized entropies, computed at the word level,…

Physics and Society · Physics 2017-02-15 Eduardo G. Altmann , Laercio Dias , Martin Gerlach

CompText: Visualizing, Comparing & Understanding Text Corpus

A common practice in Natural Language Processing (NLP) is to visualize the text corpus without reading through the entire literature, still grasping the central idea and key points described. For a long time, researchers focused on…

Computation and Language · Computer Science 2022-07-29 Suvi Varshney , Divjeet Singh Jas

Visualizing Topics with Multi-Word Expressions

We describe a new method for visualizing topics, the distributions over terms that are automatically extracted from large text corpora using latent variable models. Our method finds significant $n$-grams related to a topic, which are then…

Machine Learning · Statistics 2009-07-07 David M. Blei , John D. Lafferty

Graph-Based Change-Point Detection

We consider the testing and estimation of change-points -- locations where the distribution abruptly changes -- in a data sequence. A new approach, based on scan statistics utilizing graphs representing the similarity between observations,…

Methodology · Statistics 2015-02-18 Hao Chen , Nancy Zhang

Explainable identification of similarities between entities for discovery in large text

With the availability of virtually infinite number text documents in digital format, automatic comparison of textual data is essential for extracting meaningful insights that are difficult to identify manually. Many existing tools,…

Information Retrieval · Computer Science 2025-03-25 Akhil Joshi , Sai Teja Erukude , Lior Shamir

On Bi-gram Graph Attributes

We propose a new approach to text semantic analysis and general corpus analysis using, as termed in this article, a "bi-gram graph" representation of a corpus. The different attributes derived from graph theory are measured and analyzed as…

Machine Learning · Computer Science 2021-07-30 Thomas Konstantinovsky , Matan Mizrachi

Benchmarking sentiment analysis methods for large-scale texts: A case for using continuum-scored words and word shift graphs

The emergence and global adoption of social media has rendered possible the real-time estimation of population-scale sentiment, bearing profound implications for our understanding of human behavior. Given the growing assortment of sentiment…

Computation and Language · Computer Science 2016-09-08 Andrew J. Reagan , Brian Tivnan , Jake Ryland Williams , Christopher M. Danforth , Peter Sheridan Dodds

Measuring Word Significance using Distributed Representations of Words

Distributed representations of words as real-valued vectors in a relatively low-dimensional space aim at extracting syntactic and semantic features from large text corpora. A recently introduced neural network, named word2vec (Mikolov et…

Computation and Language · Computer Science 2015-08-11 Adriaan M. J. Schakel , Benjamin J. Wilson

Diachronic Variation in Grammatical Relations

We present a method of finding and analyzing shifts in grammatical relations found in diachronic corpora. Inspired by the econometric technique of measuring return and volatility instead of relative frequencies, we propose them as a way to…

Computation and Language · Computer Science 2012-12-14 Aaron Gerow , Khurshid Ahmad

Leveraging Deep Graph-Based Text Representation for Sentiment Polarity Applications

Over the last few years, machine learning over graph structures has manifested a significant enhancement in text mining applications such as event detection, opinion mining, and news recommendation. One of the primary challenges in this…

Computation and Language · Computer Science 2019-11-26 Kayvan Bijari , Hadi Zare , Emad Kebriaei , Hadi Veisi

A Picture for the Words! Textual Visualization in Big Data Analytics

Data Visualization has become an important aspect of big data analytics and has grown in sophistication and variety. We specifically identify the need for an analytical framework for data visualization with textual information. Data…

Social and Information Networks · Computer Science 2020-05-19 Cherilyn Conner , Jim Samuel , Andrey Kretinin , Yana Samuel , Lee Nadeau

"What is Different Between These Datasets?" A Framework for Explaining Data Distribution Shifts

The performance of machine learning models relies heavily on the quality of input data, yet real-world applications often face significant data-related challenges. A common issue arises when curating training data or deploying models: two…

Machine Learning · Computer Science 2025-09-24 Varun Babbar , Zhicheng Guo , Cynthia Rudin

Evaluating Temporal Graphs Built from Texts via Transitive Reduction

Temporal information has been the focus of recent attention in information extraction, leading to some standardization effort, in particular for the task of relating events in a text. This task raises the problem of comparing two…

Computation and Language · Computer Science 2014-01-17 Xavier Tannier , Philippe Muller

Contextualized Word Vector-based Methods for Discovering Semantic Differences with No Training nor Word Alignment

In this paper, we propose methods for discovering semantic differences in words appearing in two corpora based on the norms of contextualized word vectors. The key idea is that the coverage of meanings is reflected in the norm of its mean…

Computation and Language · Computer Science 2023-05-22 Ryo Nagata , Hiroya Takamura , Naoki Otani , Yoshifumi Kawasaki

Graph Neural Networks on Discriminative Graphs of Words

In light of the recent success of Graph Neural Networks (GNNs) and their ability to perform inference on complex data structures, many studies apply GNNs to the task of text classification. In most previous methods, a heterogeneous graph,…

Machine Learning · Computer Science 2024-10-29 Yassine Abbahaddou , Johannes F. Lutzeyer , Michalis Vazirgiannis

Simple, Interpretable and Stable Method for Detecting Words with Usage Change across Corpora

The problem of comparing two bodies of text and searching for words that differ in their usage between them arises often in digital humanities and computational social science. This is commonly approached by training word embeddings on each…

Computation and Language · Computer Science 2021-12-30 Hila Gonen , Ganesh Jawahar , Djamé Seddah , Yoav Goldberg

From Graphs to Words: A Computer-Assisted Framework for the Production of Accessible Text Descriptions

In the digital landscape, the ubiquity of data visualizations in media underscores the necessity for accessibility to ensure inclusivity for all users, including those with visual impairments. Current visual content often fails to cater to…

Human-Computer Interaction · Computer Science 2024-09-27 Qiang Xu , Thomas Hurtut

Multi-document Summarization by Graph Search and Matching

We describe a new method for summarizing similarities and differences in a pair of related documents using a graph representation for text. Concepts denoted by words, phrases, and proper names in the document are represented positionally as…

cmp-lg · Computer Science 2007-05-23 Inderjeet Mani , Eric Bloedorn