English

BERT-based Ranking for Biomedical Entity Normalization

Information Retrieval 2019-08-12 v1 Computation and Language Machine Learning

Abstract

Developing high-performance entity normalization algorithms that can alleviate the term variation problem is of great interest to the biomedical community. Although deep learning-based methods have been successfully applied to biomedical entity normalization, they often depend on traditional context-independent word embeddings. Bidirectional Encoder Representations from Transformers (BERT), BERT for Biomedical Text Mining (BioBERT) and BERT for Clinical Text Mining (ClinicalBERT) were recently introduced to pre-train contextualized word representation models using bidirectional Transformers, advancing the state-of-the-art for many natural language processing tasks. In this study, we proposed an entity normalization architecture by fine-tuning the pre-trained BERT / BioBERT / ClinicalBERT models and conducted extensive experiments to evaluate the effectiveness of the pre-trained models for biomedical entity normalization using three different types of datasets. Our experimental results show that the best fine-tuned models consistently outperformed previous methods and advanced the state-of-the-art for biomedical entity normalization, with up to 1.17% increase in accuracy.

Keywords

Cite

@article{arxiv.1908.03548,
  title  = {BERT-based Ranking for Biomedical Entity Normalization},
  author = {Zongcheng Ji and Qiang Wei and Hua Xu},
  journal= {arXiv preprint arXiv:1908.03548},
  year   = {2019}
}

Comments

9 pages, 1 figure, 4 tables