Related papers: Distributed NLP

Distributed Readability Analysis Of Turkish Elementary School Textbooks

The readability assessment deals with estimating the level of difficulty in reading texts.Many readability tests, which do not indicate execution efficiency, have been applied on specific texts to measure the reading grade level in science…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-13 Betul Karakus , Ibrahim Riza Hallac , Galip Aydin

VNLP: Turkish NLP Package

In this work, we present VNLP: the first dedicated, complete, open-source, well-documented, lightweight, production-ready, state-of-the-art Natural Language Processing (NLP) package for the Turkish language. It contains a wide variety of…

Computation and Language · Computer Science 2024-03-05 Meliksah Turker , Mehmet Erdi Ari , Aydin Han

Network calculus for parallel processing

In this note, we present preliminary results on the use of "network calculus" for parallel processing systems, specifically MapReduce.

Performance · Computer Science 2015-02-03 G. Kesidis , B. Urgaonkar , Y. Shan , S. Kamarava , J. Liebeherr

Using MapReduce for Large-scale Medical Image Analysis

The growth of the amount of medical image data produced on a daily basis in modern hospitals forces the adaptation of traditional medical image analysis and indexing approaches towards scalable solutions. The number of images and their…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-26 Dimitrios Markonis , Roger Schaer , Ivan Eggel , Henning Müller , Adrien Depeursinge

A Large-Scale Study of Machine Translation in the Turkic Languages

Recent advances in neural machine translation (NMT) have pushed the quality of machine translation systems to the point where they are becoming widely adopted to build competitive systems. However, there is still a large number of languages…

Computation and Language · Computer Science 2021-09-13 Jamshidbek Mirzakhalov , Anoop Babu , Duygu Ataman , Sherzod Kariev , Francis Tyers , Otabek Abduraufov , Mammad Hajili , Sardana Ivanova , Abror Khaytbaev , Antonio Laverghetta , Behzodbek Moydinboyev , Esra Onal , Shaxnoza Pulatova , Ahsan Wahab , Orhan Firat , Sriram Chellappan

GraphLab: A Distributed Framework for Machine Learning in the Cloud

Machine Learning (ML) techniques are indispensable in a wide range of fields. Unfortunately, the exponential increase of dataset sizes are rapidly extending the runtime of sequential algorithms and threatening to slow future progress in ML.…

Machine Learning · Computer Science 2011-07-06 Yucheng Low , Joseph Gonzalez , Aapo Kyrola , Danny Bickson , Carlos Guestrin

Document Classification Using Distributed Machine Learning

In this paper, we investigate the performance and success rates of Na\"ive Bayes Classification Algorithm for automatic classification of Turkish news into predetermined categories like economy, life, health etc. We use Apache Big Data…

Information Retrieval · Computer Science 2018-02-13 Galip Aydin , Ibrahim Riza Hallac

Recent Advancements and Challenges of Turkic Central Asian Language Processing

Research in NLP for Central Asian Turkic languages - Kazakh, Uzbek, Kyrgyz, and Turkmen - faces typical low-resource language challenges like data scarcity, limited linguistic resources and technology development. However, recent…

Computation and Language · Computer Science 2026-02-17 Yana Veitsman , Mareike Hartmann

Parallel Sorted Neighborhood Blocking with MapReduce

Cloud infrastructures enable the efficient parallel execution of data-intensive tasks such as entity resolution on large datasets. We investigate challenges and possible solutions of using the MapReduce programming model for parallel entity…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-10-18 Lars Kolb , Andreas Thor , Erhard Rahm

Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks

Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-30 Vladyslav Taran , Oleg Alienin , Sergii Stirenko , A. Rojbi , Yuri Gordienko

Natural Language Processing using Hadoop and KOSHIK

Natural language processing, as a data analytics related technology, is used widely in many research areas such as artificial intelligence, human language processing, and translation. At present, due to explosive growth of data, there are…

Computation and Language · Computer Science 2016-08-17 Emre Erturk , Hong Shi

Parallel Spectral Clustering Algorithm Based on Hadoop

Spectral clustering and cloud computing is emerging branch of computer science or related discipline. It overcome the shortcomings of some traditional clustering algorithm and guarantee the convergence to the optimal solution, thus have to…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-06-02 Yajun Cui , Yang Zhao , Kafei Xiao , Chenglong Zhang , Lei Wang

Parallelizing Word2Vec in Shared and Distributed Memory

Word2Vec is a widely used algorithm for extracting low-dimensional vector representations of words. It generated considerable excitement in the machine learning and natural language processing (NLP) communities recently due to its…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-09 Shihao Ji , Nadathur Satish , Sheng Li , Pradeep Dubey

NLP Workbench: Efficient and Extensible Integration of State-of-the-art Text Mining Tools

NLP Workbench is a web-based platform for text mining that allows non-expert users to obtain semantic understanding of large-scale corpora using state-of-the-art text mining models. The platform is built upon latest pre-trained models and…

Computation and Language · Computer Science 2024-03-06 Peiran Yao , Matej Kosmajac , Abeer Waheed , Kostyantyn Guzhva , Natalie Hervieux , Denilson Barbosa

An Experimental Evaluation of Performance of A Hadoop Cluster on Replica Management

Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing. A Hadoop cluster is a unique type of computational cluster designed for storing and analyzing large data sets across cluster of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-10 Muralikrishnan Ramane , Sharmila Krishnamoorthy , Sasikala Gowtham

Parallelizing Machine Learning as a Service for the End-User

As ML applications are becoming ever more pervasive, fully-trained systems are made increasingly available to a wide public, allowing end-users to submit queries with their own data, and to efficiently retrieve results. With increasingly…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-01 Daniela Loreti , Marco Lippi , Paolo Torroni

TurkicNLP: An NLP Toolkit for Turkic Languages

Natural language processing for the Turkic language family, spoken by over 200 million people across Eurasia, remains fragmented, with most languages lacking unified tooling and resources. We present TurkicNLP, an open-source Python library…

Computation and Language · Computer Science 2026-05-25 Sherzod Hakimov

Mukayese: Turkish NLP Strikes Back

Having sufficient resources for language X lifts it from the under-resourced languages class, but not necessarily from the under-researched class. In this paper, we address the problem of the absence of organized benchmarks in the Turkish…

Computation and Language · Computer Science 2022-03-17 Ali Safaya , Emirhan Kurtuluş , Arda Göktoğan , Deniz Yuret

A fully automated and scalable Parallel Data Augmentation for Low Resource Languages using Image and Text Analytics

Linguistic diversity across the world creates a disparity with the availability of good quality digital language resources thereby restricting the technological benefits to majority of human population. The lack or absence of data resources…

Computation and Language · Computer Science 2025-10-16 Prawaal Sharma , Navneet Goyal , Poonam Goyal , Vishnupriyan R

The Current State of Finnish NLP

There are a lot of tools and resources available for processing Finnish. In this paper, we survey recent papers focusing on Finnish NLP related to many different subcategories of NLP such as parsing, generation, semantics and speech. NLP…

Computation and Language · Computer Science 2021-09-24 Mika Hämäläinen , Khalid Alnajjar