Evaluating Transformer-Based Multilingual Text Classification

Sophie Groenwold; Samhita Honnavalli; Lily Ou; Aesha Parekh; Sharon Levy; Diba Mirza; William Yang Wang

Evaluating Transformer-Based Multilingual Text Classification

Computation and Language 2020-05-04 v2

Authors: Sophie Groenwold , Samhita Honnavalli , Lily Ou , Aesha Parekh , Sharon Levy , Diba Mirza , William Yang Wang

Abstract

As NLP tools become ubiquitous in today's technological landscape, they are increasingly applied to languages with a variety of typological structures. However, NLP research does not focus primarily on typological differences in its analysis of state-of-the-art language models. As a result, NLP tools perform unequally across languages with different syntactic and morphological structures. Through a detailed discussion of word order typology, morphological typology, and comparative linguistics, we identify which variables most affect language modeling efficacy; in addition, we calculate word order and morphological similarity indices to aid our empirical study. We then use this background to support our analysis of an experiment we conduct using multi-class text classification on eight languages and eight models.

Keywords

natural language processing text classification language modeling

Cite

@article{arxiv.2004.13939,
  title  = {Evaluating Transformer-Based Multilingual Text Classification},
  author = {Sophie Groenwold and Samhita Honnavalli and Lily Ou and Aesha Parekh and Sharon Levy and Diba Mirza and William Yang Wang},
  journal= {arXiv preprint arXiv:2004.13939},
  year   = {2020}
}

Comments

Total of 15 pages (9 pages for paper, 2 pages for references, 4 pages for appendix). Changed title

Evaluating Transformer-Based Multilingual Text Classification

Abstract

Keywords

Cite

Comments

Related papers