English

Text Characterization Toolkit

Computation and Language 2022-10-05 v1 Machine Learning

Abstract

In NLP, models are usually evaluated by reporting single-number performance scores on a number of readily available benchmarks, without much deeper analysis. Here, we argue that - especially given the well-known fact that benchmarks often contain biases, artefacts, and spurious correlations - deeper results analysis should become the de-facto standard when presenting new models or benchmarks. We present a tool that researchers can use to study properties of the dataset and the influence of those properties on their models' behaviour. Our Text Characterization Toolkit includes both an easy-to-use annotation tool, as well as off-the-shelf scripts that can be used for specific analyses. We also present use-cases from three different domains: we use the tool to predict what are difficult examples for given well-known trained models and identify (potentially harmful) biases and heuristics that are present in a dataset.

Keywords

Cite

@article{arxiv.2210.01734,
  title  = {Text Characterization Toolkit},
  author = {Daniel Simig and Tianlu Wang and Verna Dankers and Peter Henderson and Khuyagbaatar Batsuren and Dieuwke Hupkes and Mona Diab},
  journal= {arXiv preprint arXiv:2210.01734},
  year   = {2022}
}
R2 v1 2026-06-28T02:47:26.555Z