ICDBigBird: A Contextual Embedding Model for ICD Code Classification
Abstract
The International Classification of Diseases (ICD) system is the international standard for classifying diseases and procedures during a healthcare encounter and is widely used for healthcare reporting and management purposes. Assigning correct codes for clinical procedures is important for clinical, operational, and financial decision-making in healthcare. Contextual word embedding models have achieved state-of-the-art results in multiple NLP tasks. However, these models have yet to achieve state-of-the-art results in the ICD classification task since one of their main disadvantages is that they can only process documents that contain a small number of tokens which is rarely the case with real patient notes. In this paper, we introduce ICDBigBird a BigBird-based model which can integrate a Graph Convolutional Network (GCN), that takes advantage of the relations between ICD codes in order to create 'enriched' representations of their embeddings, with a BigBird contextual model that can process larger documents. Our experiments on a real-world clinical dataset demonstrate the effectiveness of our BigBird-based model on the ICD classification task as it outperforms the previous state-of-the-art models.
Cite
@article{arxiv.2204.10408,
title = {ICDBigBird: A Contextual Embedding Model for ICD Code Classification},
author = {George Michalopoulos and Michal Malyska and Nicola Sahar and Alexander Wong and Helen Chen},
journal= {arXiv preprint arXiv:2204.10408},
year = {2022}
}
Comments
7 pages, 1 figure, accepted in BioNLP 2022