English

ICDBigBird: A Contextual Embedding Model for ICD Code Classification

Computation and Language 2022-04-25 v1 Machine Learning

Abstract

The International Classification of Diseases (ICD) system is the international standard for classifying diseases and procedures during a healthcare encounter and is widely used for healthcare reporting and management purposes. Assigning correct codes for clinical procedures is important for clinical, operational, and financial decision-making in healthcare. Contextual word embedding models have achieved state-of-the-art results in multiple NLP tasks. However, these models have yet to achieve state-of-the-art results in the ICD classification task since one of their main disadvantages is that they can only process documents that contain a small number of tokens which is rarely the case with real patient notes. In this paper, we introduce ICDBigBird a BigBird-based model which can integrate a Graph Convolutional Network (GCN), that takes advantage of the relations between ICD codes in order to create 'enriched' representations of their embeddings, with a BigBird contextual model that can process larger documents. Our experiments on a real-world clinical dataset demonstrate the effectiveness of our BigBird-based model on the ICD classification task as it outperforms the previous state-of-the-art models.

Keywords

Cite

@article{arxiv.2204.10408,
  title  = {ICDBigBird: A Contextual Embedding Model for ICD Code Classification},
  author = {George Michalopoulos and Michal Malyska and Nicola Sahar and Alexander Wong and Helen Chen},
  journal= {arXiv preprint arXiv:2204.10408},
  year   = {2022}
}

Comments

7 pages, 1 figure, accepted in BioNLP 2022