Related papers: Clustering Algorithm for Gujarati Language

Bangla Word Clustering Based on Tri-gram, 4-gram and 5-gram Language Model

In this paper, we describe a research method that generates Bangla word clusters on the basis of relating to meaning in language and contextual similarity. The importance of word clustering is in parts of speech (POS) tagging, word sense…

Computation and Language · Computer Science 2017-01-31 Dipaloke Saha , Md Saddam Hossain , MD. Saiful Islam , Sabir Ismail

Improving the quality of Gujarati-Hindi Machine Translation through part-of-speech tagging and stemmer-assisted transliteration

Machine Translation for Indian languages is an emerging research area. Transliteration is one such module that we design while designing a translation system. Transliteration means mapping of source language text into the target language.…

Computation and Language · Computer Science 2013-07-15 Juhi Ameta , Nisheeth Joshi , Iti Mathur

A Literature Review: Stemming Algorithms for Indian Languages

Stemming is the process of extracting root word from the given inflection word. It also plays significant role in numerous application of Natural Language Processing (NLP). The stemming problem has addressed in many contexts and by…

Computation and Language · Computer Science 2013-08-27 M. Thangarasu , R. Manavalan

Using Genetic Algorithms for Texts Classification Problems

The avalanche quantity of the information developed by mankind has led to concept of automation of knowledge extraction - Data Mining ([1]). This direction is connected with a wide spectrum of problems - from recognition of the fuzzy set to…

Machine Learning · Computer Science 2009-06-05 A. A. Shumeyko , S. L. Sotnik

Computing Word Classes Using Spectral Clustering

Clustering a lexicon of words is a well-studied problem in natural language processing (NLP). Word clusters are used to deal with sparse data in statistical language processing, as well as features for solving various NLP tasks (text…

Computation and Language · Computer Science 2018-08-17 Effi Levi , Saggy Herman , Ari Rappoport

Overview of Stemming Algorithms for Indian and Non-Indian Languages

Stemming is a pre-processing step in Text Mining applications as well as a very common requirement of Natural Language processing functions. Stemming is the process for reducing inflected words to their stem. The main purpose of stemming is…

Computation and Language · Computer Science 2014-04-11 Dalwadi Bijal , Suthar Sanket

N-gram Statistical Stemmer for Bangla Corpus

Stemming is a process that can be utilized to trim inflected words to stem or root form. It is useful for enhancing the retrieval effectiveness, especially for text search in order to solve the mismatch problems. Previous research on Bangla…

Computation and Language · Computer Science 2019-12-30 Rabeya Sadia , Md Ataur Rahman , Md Hanif Seddiqui

Testing network clustering algorithms with Natural Language Processing

The advent of online social networks has led to the development of an abundant literature on the study of online social groups and their relationship to individuals' personalities as revealed by their textual productions. Social structures…

Social and Information Networks · Computer Science 2024-06-26 Ixandra Achitouv , David Chavalarias , Bruno Gaume

Mimicking Human Process: Text Representation via Latent Semantic Clustering for Classification

Considering that words with different characteristic in the text have different importance for classification, grouping them together separately can strengthen the semantic expression of each part. Thus we propose a new text representation…

Computation and Language · Computer Science 2019-06-19 Xiaoye Tan , Rui Yan , Chongyang Tao , Mingrui Wu

Natural language processing for clusterization of genes according to their functions

There are hundreds of methods for analysis of data obtained in mRNA-sequencing. The most of them are focused on small number of genes. In this study, we propose an approach that reduces the analysis of several thousand genes to analysis of…

Computation and Language · Computer Science 2023-08-28 Vladislav Dordiuk , Ekaterina Demicheva , Fernando Polanco Espino , Konstantin Ushenin

Experimental Estimation of Number of Clusters Based on Cluster Quality

Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering…

Information Retrieval · Computer Science 2015-03-12 G. Hannah Grace , Kalyani Desikan

Neural Compound-Word (Sandhi) Generation and Splitting in Sanskrit Language

This paper describes neural network based approaches to the process of the formation and splitting of word-compounding, respectively known as the Sandhi and Vichchhed, in Sanskrit language. Sandhi is an important idea essential to…

Computation and Language · Computer Science 2024-09-05 Sushant Dave , Arun Kumar Singh , Prathosh A. P. , Brejesh Lall

Introduction to Clustering Algorithms and Applications

Data clustering is the process of identifying natural groupings or clusters within multidimensional data based on some similarity measure. Clustering is a fundamental process in many different disciplines. Hence, researchers from different…

Machine Learning · Computer Science 2014-08-26 Sibei Yang , Liangde Tao , Bingchen Gong

A Short Survey on Data Clustering Algorithms

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial…

Data Structures and Algorithms · Computer Science 2015-12-01 Ka-Chun Wong

Multilingual Neural Machine Translation with Language Clustering

Multilingual neural machine translation (NMT), which translates multiple languages using a single model, is of great practical importance due to its advantages in simplifying the training process, reducing online maintenance costs, and…

Computation and Language · Computer Science 2019-08-27 Xu Tan , Jiale Chen , Di He , Yingce Xia , Tao Qin , Tie-Yan Liu

Document clustering with evolved multiword search queries

Text clustering holds significant value across various domains due to its ability to identify patterns and group related information. Current approaches which rely heavily on a computed similarity measure between documents are often limited…

Information Retrieval · Computer Science 2025-04-09 Laurence Hirsch , Robin Hirsch , Bayode Ogunleye

Morpheme Boundary Detection & Grammatical Feature Prediction for Gujarati : Dataset & Model

Developing Natural Language Processing resources for a low resource language is a challenging but essential task. In this paper, we present a Morphological Analyzer for Gujarati. We have used a Bi-Directional LSTM based approach to perform…

Computation and Language · Computer Science 2024-09-04 Jatayu Baxi , Brijesh Bhatt

Clustering genomic words in human DNA using peaks and trends of distributions

In this work we seek clusters of genomic words in human DNA by studying their inter-word lag distributions. Due to the particularly spiked nature of these histograms, a clustering procedure is proposed that first decomposes each…

Applications · Statistics 2021-01-13 Ana Helena Tavares , Jakob Raymaekers , Peter J. Rousseeuw , Paula Brito , Vera Afreixo

Semi-Supervised Constrained Clustering: An In-Depth Overview, Ranked Taxonomy and Future Research Directions

Clustering is a well-known unsupervised machine learning approach capable of automatically grouping discrete sets of instances with similar characteristics. Constrained clustering is a semi-supervised extension to this process that can be…

Machine Learning · Computer Science 2023-03-02 Germán González-Almagro , Daniel Peralta , Eli De Poorter , José-Ramón Cano , Salvador García

Natural Language Processing (almost) from Scratch

We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including: part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This…

Machine Learning · Computer Science 2011-03-03 Ronan Collobert , Jason Weston , Leon Bottou , Michael Karlen , Koray Kavukcuoglu , Pavel Kuksa