English

Multi-Granular Text Encoding for Self-Explaining Categorization

Computation and Language 2019-07-22 v1 Machine Learning

Abstract

Self-explaining text categorization requires a classifier to make a prediction along with supporting evidence. A popular type of evidence is sub-sequences extracted from the input text which are sufficient for the classifier to make the prediction. In this work, we define multi-granular ngrams as basic units for explanation, and organize all ngrams into a hierarchical structure, so that shorter ngrams can be reused while computing longer ngrams. We leverage a tree-structured LSTM to learn a context-independent representation for each unit via parameter sharing. Experiments on medical disease classification show that our model is more accurate, efficient and compact than BiLSTM and CNN baselines. More importantly, our model can extract intuitive multi-granular evidence to support its predictions.

Keywords

Cite

@article{arxiv.1907.08532,
  title  = {Multi-Granular Text Encoding for Self-Explaining Categorization},
  author = {Zhiguo Wang and Yue Zhang and Mo Yu and Wei Zhang and Lin Pan and Linfeng Song and Kun Xu and Yousef El-Kurdi},
  journal= {arXiv preprint arXiv:1907.08532},
  year   = {2019}
}

Comments

Accepted by BlackboxNLP 2019