English

Unsupervised Extractive Summarization using Pointwise Mutual Information

Computation and Language 2021-03-24 v2 Machine Learning

Abstract

Unsupervised approaches to extractive summarization usually rely on a notion of sentence importance defined by the semantic similarity between a sentence and the document. We propose new metrics of relevance and redundancy using pointwise mutual information (PMI) between sentences, which can be easily computed by a pre-trained language model. Intuitively, a relevant sentence allows readers to infer the document content (high PMI with the document), and a redundant sentence can be inferred from the summary (high PMI with the summary). We then develop a greedy sentence selection algorithm to maximize relevance and minimize redundancy of extracted sentences. We show that our method outperforms similarity-based methods on datasets in a range of domains including news, medical journal articles, and personal anecdotes.

Keywords

Cite

@article{arxiv.2102.06272,
  title  = {Unsupervised Extractive Summarization using Pointwise Mutual Information},
  author = {Vishakh Padmakumar and He He},
  journal= {arXiv preprint arXiv:2102.06272},
  year   = {2021}
}

Comments

To appear at EACL 2021