English

Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ

Computation and Language 2017-04-24 v3 Information Retrieval

Abstract

Scattertext is an open source tool for visualizing linguistic variation between document categories in a language-independent way. The tool presents a scatterplot, where each axis corresponds to the rank-frequency a term occurs in a category of documents. Through a tie-breaking strategy, the tool is able to display thousands of visible term-representing points and find space to legibly label hundreds of them. Scattertext also lends itself to a query-based visualization of how the use of terms with similar embeddings differs between document categories, as well as a visualization for comparing the importance scores of bag-of-words features to univariate metrics.

Keywords

Cite

@article{arxiv.1703.00565,
  title  = {Scattertext: a Browser-Based Tool for Visualizing how Corpora Differ},
  author = {Jason S. Kessler},
  journal= {arXiv preprint arXiv:1703.00565},
  year   = {2017}
}

Comments

ACL 2017 Demos. 6 pages, 5 figures. See the Githup repo https://github.com/JasonKessler/scattertext for source code and documentation