Keyphrase Extraction : Enhancing Lists
Abstract
This paper proposes some modest improvements to Extractor, a state-of-the-art keyphrase extraction system, by using a terabyte-sized corpus to estimate the informativeness and semantic similarity of keyphrases. We present two techniques to improve the organization and remove outliers of lists of keyphrases. The first is a simple ordering according to their occurrences in the corpus; the second is clustering according to semantic similarity. Evaluation issues are discussed. We present a novel technique of comparing extracted keyphrases to a gold standard which relies on semantic similarity rather than string matching or an evaluation involving human judges.
Cite
@article{arxiv.1204.0255,
title = {Keyphrase Extraction : Enhancing Lists},
author = {Mario Jarmasz and Caroline Barrière},
journal= {arXiv preprint arXiv:1204.0255},
year = {2012}
}
Comments
8 pages; Proceedings of the 2nd Conference on Computational Linguistics in the North-East (CLiNE 2004), Montr\'eal, Canada, August