Related papers: XSTEM: An exemplar-based stemming algorithm

Overview of Stemming Algorithms for Indian and Non-Indian Languages

Stemming is a pre-processing step in Text Mining applications as well as a very common requirement of Natural Language processing functions. Stemming is the process for reducing inflected words to their stem. The main purpose of stemming is…

Computation and Language · Computer Science 2014-04-11 Dalwadi Bijal , Suthar Sanket

Suffix Stripping Problem as an Optimization Problem

Stemming or suffix stripping, an important part of the modern Information Retrieval systems, is to find the root word (stem) out of a given cluster of words. Existing algorithms targeting this problem have been developed in a haphazard…

Information Retrieval · Computer Science 2013-12-25 B. P. Pande , Pawan Tamta , H. S. Dhami

A Literature Review: Stemming Algorithms for Indian Languages

Stemming is the process of extracting root word from the given inflection word. It also plays significant role in numerous application of Natural Language Processing (NLP). The stemming problem has addressed in many contexts and by…

Computation and Language · Computer Science 2013-08-27 M. Thangarasu , R. Manavalan

Large Language Models for Stemming: Promises, Pitfalls and Failures

Text stemming is a natural language processing technique that is used to reduce words to their base form, also known as the root form. The use of stemming in IR has been shown to often improve the effectiveness of keyword-matching models…

Information Retrieval · Computer Science 2024-02-20 Shuai Wang , Shengyao Zhuang , Guido Zuccon

Stemmer for Serbian language

In linguistic morphology and information retrieval, stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form; generally a written word form. In this work is presented suffix stripping…

Computation and Language · Computer Science 2012-09-21 Nikola Milošević

SS4MCT: A Statistical Stemmer for Morphologically Complex Texts

There have been multiple attempts to resolve various inflection matching problems in information retrieval. Stemming is a common approach to this end. Among many techniques for stemming, statistical stemming has been shown to be effective…

Information Retrieval · Computer Science 2016-06-22 Javid Dadashkarimi , Hossein Nasr Esfahani , Heshaam Faili , Azadeh Shakery

N-gram Statistical Stemmer for Bangla Corpus

Stemming is a process that can be utilized to trim inflected words to stem or root form. It is useful for enhancing the retrieval effectiveness, especially for text search in order to solve the mismatch problems. Previous research on Bangla…

Computation and Language · Computer Science 2019-12-30 Rabeya Sadia , Md Ataur Rahman , Md Hanif Seddiqui

UzbekStemmer: Development of a Rule-Based Stemming Algorithm for Uzbek Language

In this paper we present a rule-based stemming algorithm for the Uzbek language. Uzbek is an agglutinative language, so many words are formed by adding suffixes, and the number of suffixes is also large. For this reason, it is difficult to…

Computation and Language · Computer Science 2022-10-31 Maksud Sharipov , Ollabergan Yuldashov

Stemmers for Tamil Language: Performance Analysis

Stemming is the process of extracting root word from the given inflection word and also plays significant role in numerous application of Natural Language Processing (NLP). Tamil Language raises several challenges to NLP, since it has rich…

Computation and Language · Computer Science 2013-10-03 M. Thangarasu , R. Manavalan

A new hybrid stemming algorithm for Persian

Stemming has been an influential part in Information retrieval and search engines. There have been tremendous endeavours in making stemmer that are both efficient and accurate. Stemmers can have three method in stemming, Dictionary based…

Computation and Language · Computer Science 2015-11-25 Adel Rahimi

Schema-Based Automata Determinization

We propose an algorithm for schema-based determinization of finite automata on words and of step-wise hedge automata on nested words. The idea is to integrate schema-based cleaning directly into automata determinization. We prove the…

Formal Languages and Automata Theory · Computer Science 2022-09-22 Joachim Niehren , Momar Sakho , Antonio Al Serhali

A Nepali Rule Based Stemmer and its performance on different NLP applications

Stemming is an integral part of Natural Language Processing (NLP). It's a preprocessing step in almost every NLP application. Arguably, the most important usage of stemming is in Information Retrieval (IR). While there are lots of work done…

Computation and Language · Computer Science 2020-02-25 Pravesh Koirala , Aman Shakya

Rule Based Stemmer in Urdu

Urdu is a combination of several languages like Arabic, Hindi, English, Turkish, Sanskrit etc. It has a complex and rich morphology. This is the reason why not much work has been done in Urdu language processing. Stemming is used to convert…

Computation and Language · Computer Science 2013-10-03 Vaishali Gupta , Nisheeth Joshi , Iti Mathur

An Accuracy-Enhanced Stemming Algorithm for Arabic Information Retrieval

This paper provides a method for indexing and retrieving Arabic texts, based on natural language processing. Our approach exploits the notion of template in word stemming and replaces the words by their stems. This technique has proven to…

Computation and Language · Computer Science 2019-11-20 Sadik Bessou , Mohamed Touahria

Comparing Neural- and N-Gram-Based Language Models for Word Segmentation

Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based on a beam search algorithm and…

Computation and Language · Computer Science 2018-12-04 Yerai Doval , Carlos Gómez-Rodríguez

CBAS: context based arabic stemmer

Arabic morphology encapsulates many valuable features such as word root. Arabic roots are being utilized for many tasks; the process of extracting a word root is referred to as stemming. Stemming is an essential part of most Natural…

Computation and Language · Computer Science 2016-11-02 Mahmoud El-Defrawy , Yasser El-Sonbaty , Nahla A. Belal

Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on…

Information Retrieval · Computer Science 2012-09-17 Juan-Manuel Torres-Moreno

Jointly Learning Word Embeddings and Latent Topics

Word embedding models such as Skip-gram learn a vector-space representation for each word, based on the local word collocation patterns that are observed in a text corpus. Latent topic models, on the other hand, take a more global view,…

Computation and Language · Computer Science 2017-06-23 Bei Shi , Wai Lam , Shoaib Jameel , Steven Schockaert , Kwun Ping Lai

Word Embedding based on Low-Rank Doubly Stochastic Matrix Decomposition

Word embedding, which encodes words into vectors, is an important starting point in natural language processing and commonly used in many text-based machine learning tasks. However, in most current word embedding approaches, the similarity…

Computation and Language · Computer Science 2018-12-27 Denis Sedov , Zhirong Yang

Approximate textual retrieval

An approximate textual retrieval algorithm for searching sources with high levels of defects is presented. It considers splitting the words in a query into two overlapping segments and subsequently building composite regular expressions…

Information Retrieval · Computer Science 2007-05-23 Pere Constans