English
Related papers

Related papers: BSP vs MapReduce

200 papers

In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the usefulness of our approach by designing and analyzing efficient MapReduce algorithms for fundamental sorting, searching, and simulation…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-01-11 Michael T. Goodrich , Nodari Sitchinava , Qin Zhang

In this paper, we describe efficient MapReduce simulations of parallel algorithms specified in the BSP and PRAM models. We also provide some applications of these simulation results to problems in parallel computational geometry for the…

Data Structures and Algorithms · Computer Science 2015-03-14 Michael T. Goodrich

Since its introduction in 2004, the MapReduce framework has become one of the standard approaches in massive distributed and parallel computation. In contrast to its intensive use in practise, theoretical footing is still limited and only…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-12-19 Gero Greiner , Riko Jacob

More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-06-04 Tomasz Kajdanowicz , Przemyslaw Kazienko , Wojciech Indyk

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a…

Databases · Computer Science 2013-02-14 Sherif Sakr , Anna Liu , Ayman G. Fayoumi

The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-13 Chansup Byun , Jeremy Kepner , William Arcand , David Bestor , Bill Bergeron , Vijay Gadepally , Matthew Hubbell , Peter Michaleas , Julie Mullen , Andrew Prout , Antonio Rosa , Charles Yee , Albert Reuther

Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. Therefore, here are many scheduling algorithms to discuss…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-11 Rajdeep Das , Rohit Pratap Singh , Ripon Patgiri

Distributed processing frameworks, such as MapReduce, Hadoop, and Spark are popular systems for processing large amounts of data. The design of efficient algorithms in these frameworks is a challenging problem, as the systems both require…

Data Structures and Algorithms · Computer Science 2019-05-07 MohammadTaghi Hajiaghayi , Silvio Lattanzi , Saeed Seddighin , Cliff Stein

Cloud infrastructures enable the efficient parallel execution of data-intensive tasks such as entity resolution on large datasets. We investigate challenges and possible solutions of using the MapReduce programming model for parallel entity…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-10-18 Lars Kolb , Andreas Thor , Erhard Rahm

This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for MapReduce which captures the functional flavor of the paradigm by…

Data Structures and Algorithms · Computer Science 2013-06-13 Andrea Pietracaprina , Geppino Pucci , Matteo Riondato , Francesco Silvestri , Eli Upfal

Load balance is important for MapReduce to reduce job duration, increase parallel efficiency, etc. Previous work focuses on coarse-grained scheduling. This study concerns fine-grained scheduling on MapReduce operations. Each operation…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-04-15 Liya Fan , Bo Gao , Xi Sun , Fa Zhang , Zhiyong Liu

We explain how the popular, highly abstract MapReduce model of parallel computation (MRC) can be rooted in reality by explaining how it can be simulated on realistic distributed-memory parallel machine models like BSP. We first refine the…

Data Structures and Algorithms · Computer Science 2020-02-19 Peter Sanders

Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-03-15 Joshua Rosen , Neoklis Polyzotis , Vinayak Borkar , Yingyi Bu , Michael J. Carey , Markus Weimer , Tyson Condie , Raghu Ramakrishnan

MapReduce is a technique used to vastly improve distributed processing of data and can massively speed up computation. Hadoop and its MapReduce relies on JVM and Java which is expensive on memory. High Performance Computing based MapReduce…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-29 Vignesh S. , Muthumanikandan V. , Siddarth S. , Sainath G

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

The Apriori algorithm that mines frequent itemsets is one of the most popular and widely used data mining algorithms. Now days many algorithms have been proposed on parallel and distributed platforms to enhance the performance of Apriori…

Databases · Computer Science 2017-02-22 Sudhakar Singh , Rakhi Garg , P. K. Mishra

In this thesis report, we have a survey on state-of-the-art methods for modelling resource utilization of MapReduce applications regard to its configuration parameters. After implementation of one of the algorithms in literature, we tried…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-03-21 Hamidreza Barati , Nasrin Jaberi

MapReduce (and its open source implementation Hadoop) has become the de facto platform for processing large data sets. MapReduce offers a streamlined computational framework by interleaving sequential and parallel computation while hiding…

Computational Complexity · Computer Science 2019-04-22 Sungjin Im , Benjamin Moseley

The explosion of Big Data was followed by the proliferation of numerous complex parallel software stacks whose aim is to tackle the challenges of data deluge. A drawback of a such multi-layered hierarchical deployment is the inability to…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-01 Colin Barrett , Christos Kotselidis , Mikel Luján

The programming paradigm Map-Reduce and its main open-source implementation, Hadoop, have had an enormous impact on large scale data processing. Our goal in this expository writeup is two-fold: first, we want to present some complexity…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-11-29 Ashish Goel , Kamesh Munagala
‹ Prev 1 2 3 10 Next ›