English

XSTEM: An exemplar-based stemming algorithm

Computation and Language 2024-06-04 v2

Abstract

Stemming is the process of reducing related words to a standard form by removing affixes from them. Existing algorithms vary with respect to their complexity, configurability, handling of unknown words, and ability to avoid under- and over-stemming. This paper presents a fast, simple, configurable, high-precision, high-recall stemming algorithm that combines the simplicity and performance of word-based lookup tables with the strong generalizability of rule-based methods to avert problems with out-of-vocabulary words.

Keywords

Cite

@article{arxiv.2205.04355,
  title  = {XSTEM: An exemplar-based stemming algorithm},
  author = {Kirk Baker},
  journal= {arXiv preprint arXiv:2205.04355},
  year   = {2024}
}