Modular Self-Supervision for Document-Level Relation Extraction

Sheng Zhang; Cliff Wong; Naoto Usuyama; Sarthak Jain; Tristan Naumann; Hoifung Poon

Modular Self-Supervision for Document-Level Relation Extraction

Computation and Language 2021-09-14 v1

Authors: Sheng Zhang , Cliff Wong , Naoto Usuyama , Sarthak Jain , Tristan Naumann , Hoifung Poon

Abstract

Extracting relations across large text spans has been relatively underexplored in NLP, but it is particularly important for high-value domains such as biomedicine, where obtaining high recall of the latest findings is crucial for practical applications. Compared to conventional information extraction confined to short text spans, document-level relation extraction faces additional challenges in both inference and learning. Given longer text spans, state-of-the-art neural architectures are less effective and task-specific self-supervision such as distant supervision becomes very noisy. In this paper, we propose decomposing document-level relation extraction into relation detection and argument resolution, taking inspiration from Davidsonian semantics. This enables us to incorporate explicit discourse modeling and leverage modular self-supervision for each sub-problem, which is less noise-prone and can be further refined end-to-end via variational EM. We conduct a thorough evaluation in biomedical machine reading for precision oncology, where cross-paragraph relation mentions are prevalent. Our method outperforms prior state of the art, such as multi-scale learning and graph neural networks, by over 20 absolute F1 points. The gain is particularly pronounced among the most challenging relation instances whose arguments never co-occur in a paragraph.

Keywords

relation extraction information extraction clinical natural language processing

Cite

@article{arxiv.2109.05362,
  title  = {Modular Self-Supervision for Document-Level Relation Extraction},
  author = {Sheng Zhang and Cliff Wong and Naoto Usuyama and Sarthak Jain and Tristan Naumann and Hoifung Poon},
  journal= {arXiv preprint arXiv:2109.05362},
  year   = {2021}
}

Comments

EMNLP 2021

Modular Self-Supervision for Document-Level Relation Extraction

Abstract

Keywords

Cite

Comments

Related papers