Keyphrase Extraction : Enhancing Lists

Mario Jarmasz; Caroline Barrière

Keyphrase Extraction : Enhancing Lists

Computation and Language 2012-04-03 v1 Information Retrieval

Authors: Mario Jarmasz , Caroline Barrière

Abstract

This paper proposes some modest improvements to Extractor, a state-of-the-art keyphrase extraction system, by using a terabyte-sized corpus to estimate the informativeness and semantic similarity of keyphrases. We present two techniques to improve the organization and remove outliers of lists of keyphrases. The first is a simple ordering according to their occurrences in the corpus; the second is clustering according to semantic similarity. Evaluation issues are discussed. We present a novel technique of comparing extracted keyphrases to a gold standard which relies on semantic similarity rather than string matching or an evaluation involving human judges.

Keywords

information extraction information retrieval evaluation metrics

Cite

@article{arxiv.1204.0255,
  title  = {Keyphrase Extraction : Enhancing Lists},
  author = {Mario Jarmasz and Caroline Barrière},
  journal= {arXiv preprint arXiv:1204.0255},
  year   = {2012}
}

Comments

8 pages; Proceedings of the 2nd Conference on Computational Linguistics in the North-East (CLiNE 2004), Montr\'eal, Canada, August

Keyphrase Extraction : Enhancing Lists

Abstract

Keywords

Cite

Comments

Related papers