Related papers: Suffix Stripping Problem as an Optimization Proble…

XSTEM: An exemplar-based stemming algorithm

Stemming is the process of reducing related words to a standard form by removing affixes from them. Existing algorithms vary with respect to their complexity, configurability, handling of unknown words, and ability to avoid under- and…

Computation and Language · Computer Science 2024-06-04 Kirk Baker

A Literature Review: Stemming Algorithms for Indian Languages

Stemming is the process of extracting root word from the given inflection word. It also plays significant role in numerous application of Natural Language Processing (NLP). The stemming problem has addressed in many contexts and by…

Computation and Language · Computer Science 2013-08-27 M. Thangarasu , R. Manavalan

SS4MCT: A Statistical Stemmer for Morphologically Complex Texts

There have been multiple attempts to resolve various inflection matching problems in information retrieval. Stemming is a common approach to this end. Among many techniques for stemming, statistical stemming has been shown to be effective…

Information Retrieval · Computer Science 2016-06-22 Javid Dadashkarimi , Hossein Nasr Esfahani , Heshaam Faili , Azadeh Shakery

Stemmers for Tamil Language: Performance Analysis

Stemming is the process of extracting root word from the given inflection word and also plays significant role in numerous application of Natural Language Processing (NLP). Tamil Language raises several challenges to NLP, since it has rich…

Computation and Language · Computer Science 2013-10-03 M. Thangarasu , R. Manavalan

N-gram Statistical Stemmer for Bangla Corpus

Stemming is a process that can be utilized to trim inflected words to stem or root form. It is useful for enhancing the retrieval effectiveness, especially for text search in order to solve the mismatch problems. Previous research on Bangla…

Computation and Language · Computer Science 2019-12-30 Rabeya Sadia , Md Ataur Rahman , Md Hanif Seddiqui

Stemmer for Serbian language

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form; generally a written word form. In this work is presented suffix stripping…

Computation and Language · Computer Science 2012-09-21 Nikola Milošević

Large Language Models for Stemming: Promises, Pitfalls and Failures

Text stemming is a natural language processing technique that is used to reduce words to their base form, also known as the root form. The use of stemming in IR has been shown to often improve the effectiveness of keyword-matching models…

Information Retrieval · Computer Science 2024-02-20 Shuai Wang , Shengyao Zhuang , Guido Zuccon

A Nepali Rule Based Stemmer and its performance on different NLP applications

Stemming is an integral part of Natural Language Processing (NLP). It's a preprocessing step in almost every NLP application. Arguably, the most important usage of stemming is in Information Retrieval (IR). While there are lots of work done…

Computation and Language · Computer Science 2020-02-25 Pravesh Koirala , Aman Shakya

Overview of Stemming Algorithms for Indian and Non-Indian Languages

Stemming is a pre-processing step in Text Mining applications as well as a very common requirement of Natural Language processing functions. Stemming is the process for reducing inflected words to their stem. The main purpose of stemming is…

Computation and Language · Computer Science 2014-04-11 Dalwadi Bijal , Suthar Sanket

A new keyphrases extraction method based on suffix tree data structure for arabic documents clustering

Document Clustering is a branch of a larger area of scientific study known as data mining .which is an unsupervised classification using to find a structure in a collection of unlabeled data. The useful information in the documents can be…

Computation and Language · Computer Science 2014-01-23 Issam Sahmoudi , Hanane Froud , Abdelmonaime Lachkar

An Accuracy-Enhanced Stemming Algorithm for Arabic Information Retrieval

This paper provides a method for indexing and retrieving Arabic texts, based on natural language processing. Our approach exploits the notion of template in word stemming and replaces the words by their stems. This technique has proven to…

Computation and Language · Computer Science 2019-11-20 Sadik Bessou , Mohamed Touahria

Scalable String and Suffix Sorting: Algorithms, Techniques, and Tools

This dissertation focuses on two fundamental sorting problems: string sorting and suffix sorting. The first part considers parallel string sorting on shared-memory multi-core machines, the second part external memory suffix sorting using…

Data Structures and Algorithms · Computer Science 2018-08-06 Timo Bingmann

Fast and Compact Regular Expression Matching

We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized…

Data Structures and Algorithms · Computer Science 2008-09-22 Philip Bille , Martin Farach-Colton

Sampling the suffix array with minimizers

Sampling (evenly) the suffixes from the suffix array is an old idea trading the pattern search time for reduced index space. A few years ago Claude et al. showed an alphabet sampling scheme allowing for more efficient pattern searches…

Data Structures and Algorithms · Computer Science 2014-12-04 Szymon Grabowski , Marcin Raniszewski

Practical Algorithmic Techniques for Several String Processing Problems

The domains of data mining and knowledge discovery make use of large amounts of textual data, which need to be handled efficiently. Specific problems, like finding the maximum weight ordered common subset of a set of ordered sets or…

Data Structures and Algorithms · Computer Science 2009-12-07 Mugurel Ionut Andreica , Nicolae Tapus

CBAS: context based arabic stemmer

Arabic morphology encapsulates many valuable features such as word root. Arabic roots are being utilized for many tasks; the process of extracting a word root is referred to as stemming. Stemming is an essential part of most Natural…

Computation and Language · Computer Science 2016-11-02 Mahmoud El-Defrawy , Yasser El-Sonbaty , Nahla A. Belal

On-line Indexing for General Alphabets via Predecessor Queries on Subsets of an Ordered List

The problem of Text Indexing is a fundamental algorithmic problem in which one wishes to preprocess a text in order to quickly locate pattern queries within the text. In the ever evolving world of dynamic and on-line data, there is also a…

Data Structures and Algorithms · Computer Science 2012-08-21 Tsvi Kopelowitz

Fast k-best Sentence Compression

A popular approach to sentence compression is to formulate the task as a constrained optimization problem and solve it with integer linear programming (ILP) tools. Unfortunately, dependence on ILP may make the compressor prohibitively slow,…

Computation and Language · Computer Science 2015-10-29 Katja Filippova , Enrique Alfonseca

Sorting Algorithms with Restrictions

Sorting is one of the most used and well investigated algorithmic problem [1]. Traditional postulation supposes the sorting data archived, and the elementary operation as comparisons of two numbers. In a view of appearance of new processors…

Data Structures and Algorithms · Computer Science 2011-07-22 Hakob Aslanyan

Optimal-Hash Exact String Matching Algorithms

String matching is the problem of finding all the occurrences of a pattern in a text. We propose improved versions of the fast family of string matching algorithms based on hashing $q$-grams. The improvement consists of considering minimal…

Data Structures and Algorithms · Computer Science 2023-03-13 Thierry Lecroq