English

Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction

Computation and Language 2022-05-13 v1 Machine Learning

Abstract

This paper investigates the effectiveness of sentence-level transformers for zero-shot offensive span identification on a code-mixed Tamil dataset. More specifically, we evaluate rationale extraction methods of Local Interpretable Model Agnostic Explanations (LIME) \cite{DBLP:conf/kdd/Ribeiro0G16} and Integrated Gradients (IG) \cite{DBLP:conf/icml/SundararajanTY17} for adapting transformer based offensive language classification models for zero-shot offensive span identification. To this end, we find that LIME and IG show baseline F1F_{1} of 26.35\% and 44.83\%, respectively. Besides, we study the effect of data set size and training process on the overall accuracy of span identification. As a result, we find both LIME and IG to show significant improvement with Masked Data Augmentation and Multilabel Training, with F1F_{1} of 50.23\% and 47.38\% respectively. \textit{Disclaimer : This paper contains examples that may be considered profane, vulgar, or offensive. The examples do not represent the views of the authors or their employers/graduate schools towards any person(s), group(s), practice(s), or entity/entities. Instead they are used to emphasize only the linguistic research challenges.}

Cite

@article{arxiv.2205.06119,
  title  = {Zero-shot Code-Mixed Offensive Span Identification through Rationale Extraction},
  author = {Manikandan Ravikiran and Bharathi Raja Chakravarthi},
  journal= {arXiv preprint arXiv:2205.06119},
  year   = {2022}
}

Comments

Submission to https://dravidianlangtech.github.io/2022/

R2 v1 2026-06-24T11:15:32.920Z