Automatic Pharma News Categorization

Stanislaw Adaszewski; Pascal Kuner; Ralf J. Jaeger

Automatic Pharma News Categorization

Information Retrieval 2022-01-04 v1 Machine Learning

Authors: Stanislaw Adaszewski , Pascal Kuner , Ralf J. Jaeger

Abstract

We use a text dataset consisting of 23 news categories relevant to pharma information science, in order to compare the fine-tuning performance of multiple transformer models in a classification task. Using a well-balanced dataset with multiple autoregressive and autocoding transformation models, we compare their fine-tuning performance. To validate the winning approach, we perform diagnostics of model behavior on mispredicted instances, including inspection of category-wise metrics, evaluation of prediction certainty and assessment of latent space representations. Lastly, we propose an ensemble model consisting of the top performing individual predictors and demonstrate that this approach offers a modest improvement in the F1 metric.

Keywords

news recommendation text classification clinical natural language processing

Cite

@article{arxiv.2201.00688,
  title  = {Automatic Pharma News Categorization},
  author = {Stanislaw Adaszewski and Pascal Kuner and Ralf J. Jaeger},
  journal= {arXiv preprint arXiv:2201.00688},
  year   = {2022}
}

Comments

5 pages, 1 figure, 9 pages appendix

Automatic Pharma News Categorization

Abstract

Keywords

Cite

Comments

Related papers