English
Related papers

Related papers: Iterative MapReduce for Large Scale Machine Learni…

200 papers

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

This survey article reviews the challenges associated with deploying and optimizing big data applications and machine learning algorithms in cloud data centers and networks. The MapReduce programming model and its widely-used open-source…

Networking and Internet Architecture · Computer Science 2019-10-03 Sanaa Hamid Mohamed , Taisir E. H. El-Gorashi , Jaafar M. H. Elmirghani

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a…

Databases · Computer Science 2013-02-14 Sherif Sakr , Anna Liu , Ayman G. Fayoumi

MapReduce, the popular programming paradigm for large-scale data processing, has traditionally been deployed over tightly-coupled clusters where the data is already locally available. The assumption that the data and compute resources are…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-31 Benjamin Heintz , Abhishek Chandra , Ramesh K. Sitaraman

Distributed processing frameworks, such as MapReduce, Hadoop, and Spark are popular systems for processing large amounts of data. The design of efficient algorithms in these frameworks is a challenging problem, as the systems both require…

Data Structures and Algorithms · Computer Science 2019-05-07 MohammadTaghi Hajiaghayi , Silvio Lattanzi , Saeed Seddighin , Cliff Stein

Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. Therefore, here are many scheduling algorithms to discuss…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-11 Rajdeep Das , Rohit Pratap Singh , Ripon Patgiri

As new data and updates are constantly arriving, the results of data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-01-21 Yanfeng Zhang , Shimin Chen , Qiang Wang , Ge Yu

A common approach in the design of MapReduce algorithms is to minimize the number of rounds. Indeed, there are many examples in the literature of monolithic MapReduce algorithms, which are algorithms requiring just one or two rounds.…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-01-22 Matteo Ceccarello , Francesco Silvestri

The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-13 Chansup Byun , Jeremy Kepner , William Arcand , David Bestor , Bill Bergeron , Vijay Gadepally , Matthew Hubbell , Peter Michaleas , Julie Mullen , Andrew Prout , Antonio Rosa , Charles Yee , Albert Reuther

Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk…

Databases · Computer Science 2012-08-02 Stephan Ewen , Kostas Tzoumas , Moritz Kaufmann , Volker Markl

When dealing with massive data sorting, we usually use Hadoop which is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. A common approach in implement of…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-06-02 Zhuo Wang , Longlong Tian , Dianjie Guo , Xiaoming Jiang

Hadoop is currently the large-scale data analysis "hammer" of choice, but there exist classes of algorithms that aren't "nails", in the sense that they are not particularly amenable to the MapReduce programming model. To address this,…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-09-12 Jimmy Lin

Faced with continuously increasing scale of data, original back-propagation neural network based machine learning algorithm presents two non-trivial challenges: huge amount of data makes it difficult to maintain both efficiency and…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-10-12 Kairan Sun , Xu Wei , Gengtao Jia , Risheng Wang , Ruizhi Li

Cloud Computing is emerging as a new computational paradigm shift. Hadoop-MapReduce has become a powerful Computation Model for processing large data on distributed commodity hardware clusters such as Clouds. In all Hadoop implementations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-04 B. Thirumala Rao , L. S. S. Reddy

MapReduce has proven to be one of the most useful paradigms in the revolution of distributed computing, where cloud services and cluster computing become the standard venue for computing. The federation of cloud and big data activities is…

Databases · Computer Science 2016-07-29 Foto Afrati , Shlomi Dolev , Shantanu Sharma , Jeffrey D. Ullman

MapReduce (and its open source implementation Hadoop) has become the de facto platform for processing large data sets. MapReduce offers a streamlined computational framework by interleaving sequential and parallel computation while hiding…

Computational Complexity · Computer Science 2019-04-22 Sungjin Im , Benjamin Moseley

Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many…

Databases · Computer Science 2017-07-07 Shlomi Dolev , Patricia Florissi , Ehud Gudes , Shantanu Sharma , Ido Singer

MapReduce is a technique used to vastly improve distributed processing of data and can massively speed up computation. Hadoop and its MapReduce relies on JVM and Java which is expensive on memory. High Performance Computing based MapReduce…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-29 Vignesh S. , Muthumanikandan V. , Siddarth S. , Sainath G

Data cubes are widely used as a powerful tool to provide multidimensional views in data warehousing and On-Line Analytical Processing (OLAP). However, with increasing data sizes, it is becoming computationally expensive to perform data cube…

Databases · Computer Science 2013-11-25 Zhengkui Wang , Yan Chu , Kian-Lee Tan , Divyakant Agrawal , Amr EI Abbadi , Xiaolong Xu

In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the usefulness of our approach by designing and analyzing efficient MapReduce algorithms for fundamental sorting, searching, and simulation…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-01-11 Michael T. Goodrich , Nodari Sitchinava , Qin Zhang
‹ Prev 1 2 3 10 Next ›