Related papers: GENMR: Generalized Query Processing through Map Re…

Analyzing Large-Scale, Distributed and Uncertain Data

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks

This survey article reviews the challenges associated with deploying and optimizing big data applications and machine learning algorithms in cloud data centers and networks. The MapReduce programming model and its widely-used open-source…

Networking and Internet Architecture · Computer Science 2019-10-03 Sanaa Hamid Mohamed , Taisir E. H. El-Gorashi , Jaafar M. H. Elmirghani

The Family of MapReduce and Large Scale Data Processing Systems

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a…

Databases · Computer Science 2013-02-14 Sherif Sakr , Anna Liu , Ayman G. Fayoumi

A Novel Approach to Translate Structural Aggregation Queries to MapReduce Code

Data management applications are growing and require more attention, especially in the "big data" era. Thus, supporting such applications with novel and efficient algorithms that achieve higher performance is critical. Array database…

Databases · Computer Science 2025-02-04 Ahmed M. Abdelmoniem , Sameh Abdulah , Walid Atwa

HRDBMS: Combining the Best of Modern and Traditional Relational Databases

HRDBMS is a novel distributed relational database that uses a hybrid model combining the best of traditional distributed relational databases and Big Data analytics platforms such as Hive. This allows HRDBMS to leverage years worth of…

Databases · Computer Science 2019-01-28 Jason Arnold , Boris Glavic , Ioan Raicu

A Survey on Geographically Distributed Big-Data Processing using MapReduce

Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many…

Databases · Computer Science 2017-07-07 Shlomi Dolev , Patricia Florissi , Ehud Gudes , Shantanu Sharma , Ido Singer

Transplantation of Data Mining Algorithms to Cloud Computing Platform when Dealing Big Data

This paper made a short review of Cloud Computing and Big Data, and discussed the portability of general data mining algorithms to Cloud Computing platform. It revealed the Cloud Computing platform based on Map-Reduce cannot solve all the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-07 Yong Wang , Ya Wei Zhao

GenDB: The Next Generation of Query Processing -- Synthesized, Not Engineered

Traditional query processing relies on engines that are carefully optimized and engineered by many experts. However, new techniques and user requirements evolve rapidly, and existing systems often cannot keep pace. At the same time, these…

Databases · Computer Science 2026-03-03 Jiale Lao , Immanuel Trummer

Petuum: A New Platform for Distributed Machine Learning on Big Data

What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization…

Machine Learning · Statistics 2015-05-18 Eric P. Xing , Qirong Ho , Wei Dai , Jin Kyu Kim , Jinliang Wei , Seunghak Lee , Xun Zheng , Pengtao Xie , Abhimanu Kumar , Yaoliang Yu

LLMapReduce: Multi-Level Map-Reduce for High Performance Data Analysis

The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-13 Chansup Byun , Jeremy Kepner , William Arcand , David Bestor , Bill Bergeron , Vijay Gadepally , Matthew Hubbell , Peter Michaleas , Julie Mullen , Andrew Prout , Antonio Rosa , Charles Yee , Albert Reuther

Iterative MapReduce for Large Scale Machine Learning

Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-03-15 Joshua Rosen , Neoklis Polyzotis , Vinayak Borkar , Yingyi Bu , Michael J. Carey , Markus Weimer , Tyson Condie , Raghu Ramakrishnan

GenMapping: Unleashing the Potential of Inverse Perspective Mapping for Robust Online HD Map Construction

Online High-Definition (HD) maps have emerged as the preferred option for autonomous driving, overshadowing the counterpart offline HD maps due to flexible update capability and lower maintenance costs. However, contemporary online HD map…

Computer Vision and Pattern Recognition · Computer Science 2024-09-16 Siyu Li , Kailun Yang , Hao Shi , Song Wang , You Yao , Zhiyong Li

Survey on Improved Scheduling in Hadoop MapReduce in Cloud Environments

Cloud Computing is emerging as a new computational paradigm shift. Hadoop-MapReduce has become a powerful Computation Model for processing large data on distributed commodity hardware clusters such as Clouds. In all Hadoop implementations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-04 B. Thirumala Rao , L. S. S. Reddy

Large-scale Data Modelling in Hive and Distributed Query Processing using MapReduce and Tez

Huge amounts of data being generated continuously by digitally interconnected systems of humans, organizations and machines. Data comes in variety of formats including structured, unstructured and semi-structured, what makes it impossible…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-31 Abzetdin Adamov

Analyzing Big Datasets of Genomic Sequences: Fast and Scalable Collection of k-mer Statistics

Distributed approaches based on the map-reduce programming paradigm have started to be proposed in the bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-07-05 Umberto Ferraro Petrillo , Mara Sorella , Giuseppe Cattaneo , Raffaele Giancarlo , Simona Rombo

ReStore: Reusing Results of MapReduce Jobs

Analyzing large scale data has emerged as an important activity for many organizations in the past few years. This large scale data analysis is facilitated by the MapReduce programming and execution model and its implementations, most…

Databases · Computer Science 2012-03-02 Iman Elghandour , Ashraf Aboulnaga

Beyond Batch Processing: Towards Real-Time and Streaming Big Data

Today, big data is generated from many sources and there is a huge demand for storing, managing, processing, and querying on big data. The MapReduce model and its counterpart open source implementation Hadoop, has proven itself as the de…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-08-04 Saeed Shahrivari , Saeed Jalili

On the Complexity of Processing Massive, Unordered, Distributed Data

An existing approach for dealing with massive data sets is to stream over the input in few passes and perform computations with sublinear resources. This method does not work for truly massive data where even making a single pass over the…

Computational Complexity · Computer Science 2007-05-23 Jon Feldman , S. Muthukrishnan , Anastasios Sidiropoulos , Cliff Stein , Zoya Svitkina

Scalable Ontological Query Processing over Semantically Integrated Life Science Datasets using MapReduce

To address the requirement of enabling a comprehensive perspective of life-sciences data, Semantic Web technologies have been adopted for standardized representations of data and linkages between data. This has resulted in data warehouses…

Databases · Computer Science 2016-02-03 HyeongSik Kim , Kemafor Anyanwu

M3R: Increased performance for in-memory Hadoop jobs

Main Memory Map Reduce (M3R) is a new implementation of the Hadoop Map Reduce (HMR) API targeted at online analytics on high mean-time-to-failure clusters. It does not support resilience, and supports only those workloads which can fit into…

Databases · Computer Science 2012-08-22 Avraham Shinnar , David Cunningham , Benjamin Herta , Vijay Saraswat