English
Related papers

Related papers: A Survey on Geographically Distributed Big-Data Pr…

200 papers

Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-30 Bilal Akil , Ying Zhou , Uwe Röhm

Today, big data is generated from many sources and there is a huge demand for storing, managing, processing, and querying on big data. The MapReduce model and its counterpart open source implementation Hadoop, has proven itself as the de…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-08-04 Saeed Shahrivari , Saeed Jalili

Most of the popular Big Data analytics tools evolved to adapt their working environment to extract valuable information from a vast amount of unstructured data. The ability of data mining techniques to filter this helpful information from…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-23 Taha Tekdogan , Ali Cakmak

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

Distributed approaches based on the map-reduce programming paradigm have started to be proposed in the bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-05 Umberto Ferraro Petrillo , Mara Sorella , Giuseppe Cattaneo , Raffaele Giancarlo , Simona Rombo

MapReduce, the popular programming paradigm for large-scale data processing, has traditionally been deployed over tightly-coupled clusters where the data is already locally available. The assumption that the data and compute resources are…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-31 Benjamin Heintz , Abhishek Chandra , Ramesh K. Sitaraman

Recently, due to rapid development of information and communication technologies, the data are created and consumed in the avalanche way. Distributed computing create preconditions for analyzing and processing such Big Data by distributing…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-30 Vladyslav Taran , Oleg Alienin , Sergii Stirenko , A. Rojbi , Yuri Gordienko

This survey article reviews the challenges associated with deploying and optimizing big data applications and machine learning algorithms in cloud data centers and networks. The MapReduce programming model and its widely-used open-source…

Networking and Internet Architecture · Computer Science 2019-10-03 Sanaa Hamid Mohamed , Taisir E. H. El-Gorashi , Jaafar M. H. Elmirghani

Distributed data processing frameworks (e.g., Hadoop, Spark, and Flink) are widely used to distribute data among computing nodes of a cloud. Recently, there have been increasing efforts aimed at evaluating the performance of distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-07 Faheem Ullah , Shagun Dhingra , Xiaoyu Xia , M. Ali Babar

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a…

Databases · Computer Science 2013-02-14 Sherif Sakr , Anna Liu , Ayman G. Fayoumi

Recently we create so much data (2.5 quintillion bytes every day) that 90% of the data in the world today has been created in the last two years alone [1]. This data comes from sensors used to gather traffic or climate information, posts to…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-20 Afsin Akdogan , Hien To

Huge amounts of data being generated continuously by digitally interconnected systems of humans, organizations and machines. Data comes in variety of formats including structured, unstructured and semi-structured, what makes it impossible…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-31 Abzetdin Adamov

Distributed processing frameworks, such as MapReduce, Hadoop, and Spark are popular systems for processing large amounts of data. The design of efficient algorithms in these frameworks is a challenging problem, as the systems both require…

Data Structures and Algorithms · Computer Science 2019-05-07 MohammadTaghi Hajiaghayi , Silvio Lattanzi , Saeed Seddighin , Cliff Stein

When dealing with massive data sorting, we usually use Hadoop which is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. A common approach in implement of…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-06-02 Zhuo Wang , Longlong Tian , Dianjie Guo , Xiaoming Jiang

The objective of this work was to utilize BigBench [1] as a Big Data benchmark and evaluate and compare two processing engines: MapReduce [2] and Spark [3]. MapReduce is the established engine for processing data on Hadoop. Spark is a…

Databases · Computer Science 2016-01-14 Todor Ivanov , Max-Georg Beer

Real-world data from diverse domains require real-time scalable analysis. Large-scale data processing frameworks or engines such as Hadoop fall short when results are needed on-the-fly. Apache Spark's streaming library is increasingly…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-02 Janak Dahal , Elias Ioup , Shaikh Arifuzzaman , Mahdi Abdelguerfi

With the explosive increase of big data in industry and academic fields, it is necessary to apply large-scale data processing systems to analysis Big Data. Arguably, Spark is state of the art in large-scale data computing systems nowadays,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-17 Shanjiang Tang , Bingsheng He , Ce Yu , Yusen Li , Kun Li

During the recent years, a number of efficient and scalable frequent itemset mining algorithms for big data analytics have been proposed by many researchers. Initially, MapReduce-based frequent itemset mining algorithms on Hadoop cluster…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-06 Pankaj Singh , Sudhakar Singh , P. K. Mishra , Rakhi Garg

This work explores the use of big data technologies deployed in the cloud for processing of astronomical data. We have applied Hadoop and Spark to the task of co-adding astronomical images. We compared the overhead and execution time of…

Instrumentation and Methods for Astrophysics · Physics 2017-04-03 Ivan Kolosov , Sergey Gerasimov , Alexander Meshcheryakov

In this digitalised world where every information is stored, the data a are growing exponentially. It is estimated that data are doubles itself every two years. Geospatial data are one of the prime contributors to the big data scenario.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-23 Rakesh K. Lenka , Rabindra K. Barik , Noopur Gupta , Syed Mohd Ali , Amiya Rath , Harishchandra Dubey
‹ Prev 1 2 3 10 Next ›