Related papers: Automatic Optimization for MapReduce Programs

Analyzing Large-Scale, Distributed and Uncertain Data

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

Optimizing MapReduce for Highly Distributed Environments

MapReduce, the popular programming paradigm for large-scale data processing, has traditionally been deployed over tightly-coupled clusters where the data is already locally available. The assumption that the data and compute resources are…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-31 Benjamin Heintz , Abhishek Chandra , Ramesh K. Sitaraman

The Family of MapReduce and Large Scale Data Processing Systems

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a…

Databases · Computer Science 2013-02-14 Sherif Sakr , Anna Liu , Ayman G. Fayoumi

Iterative MapReduce for Large Scale Machine Learning

Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-03-15 Joshua Rosen , Neoklis Polyzotis , Vinayak Borkar , Yingyi Bu , Michael J. Carey , Markus Weimer , Tyson Condie , Raghu Ramakrishnan

Towards co-designed optimizations in parallel frameworks: A MapReduce case study

The explosion of Big Data was followed by the proliferation of numerous complex parallel software stacks whose aim is to tackle the challenges of data deluge. A drawback of a such multi-layered hierarchical deployment is the inability to…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-01 Colin Barrett , Christos Kotselidis , Mikel Luján

Searching for More Efficient Dynamic Programs

Computational models of human language often involve combinatorial problems. For instance, a probabilistic parser may marginalize over exponentially many trees to make predictions. Algorithms for such problems often employ dynamic…

Computation and Language · Computer Science 2021-09-16 Tim Vieira , Ryan Cotterell , Jason Eisner

MapReduce Scheduler: A 360-degree view

Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. Therefore, here are many scheduling algorithms to discuss…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-11 Rajdeep Das , Rohit Pratap Singh , Ripon Patgiri

MapReduce Meets Fine-Grained Complexity: MapReduce Algorithms for APSP, Matrix Multiplication, 3-SUM, and Beyond

Distributed processing frameworks, such as MapReduce, Hadoop, and Spark are popular systems for processing large amounts of data. The design of efficient algorithms in these frameworks is a challenging problem, as the systems both require…

Data Structures and Algorithms · Computer Science 2019-05-07 MohammadTaghi Hajiaghayi , Silvio Lattanzi , Saeed Seddighin , Cliff Stein

A-MapReduce: Executing Wide Search via Agentic MapReduce

Contemporary large language model (LLM)-based multi-agent systems exhibit systematic advantages in deep research tasks, which emphasize iterative, vertically structured information seeking. However, when confronted with wide search tasks…

Multiagent Systems · Computer Science 2026-02-03 Mingju Chen , Guibin Zhang , Heng Chang , Yuchen Guo , Shiji Zhou

LLMapReduce: Multi-Level Map-Reduce for High Performance Data Analysis

The map-reduce parallel programming model has become extremely popular in the big data community. Many big data workloads can benefit from the enhanced performance offered by supercomputers. LLMapReduce provides the familiar map-reduce…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-13 Chansup Byun , Jeremy Kepner , William Arcand , David Bestor , Bill Bergeron , Vijay Gadepally , Matthew Hubbell , Peter Michaleas , Julie Mullen , Andrew Prout , Antonio Rosa , Charles Yee , Albert Reuther

A Novel Approach to Translate Structural Aggregation Queries to MapReduce Code

Data management applications are growing and require more attention, especially in the "big data" era. Thus, supporting such applications with novel and efficient algorithms that achieve higher performance is critical. Array database…

Databases · Computer Science 2025-02-04 Ahmed M. Abdelmoniem , Sameh Abdulah , Walid Atwa

AutoML in Heavily Constrained Applications

Optimizing a machine learning pipeline for a task at hand requires careful configuration of various hyperparameters, typically supported by an AutoML system that optimizes the hyperparameters for the given training dataset. Yet, depending…

Machine Learning · Computer Science 2023-10-17 Felix Neutatz , Marius Lindauer , Ziawasch Abedjan

Sorting, Searching, and Simulation in the MapReduce Framework

In this paper, we study the MapReduce framework from an algorithmic standpoint and demonstrate the usefulness of our approach by designing and analyzing efficient MapReduce algorithms for fundamental sorting, searching, and simulation…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-01-11 Michael T. Goodrich , Nodari Sitchinava , Qin Zhang

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

MapReduce is a popular programming paradigm for developing large-scale, data-intensive computation. Many frameworks that implement this paradigm have recently been developed. To leverage these frameworks, however, developers must become…

Databases · Computer Science 2018-06-20 Maaz Bin Safeer Ahmad , Alvin Cheung

Efficient Non-Parametric Optimizer Search for Diverse Tasks

Efficient and automated design of optimizers plays a crucial role in full-stack AutoML systems. However, prior methods in optimizer search are often limited by their scalability, generability, or sample efficiency. With the goal of…

Machine Learning · Computer Science 2022-09-29 Ruochen Wang , Yuanhao Xiong , Minhao Cheng , Cho-Jui Hsieh

Review of Apriori Based Algorithms on MapReduce Framework

The Apriori algorithm that mines frequent itemsets is one of the most popular and widely used data mining algorithms. Now days many algorithms have been proposed on parallel and distributed platforms to enhance the performance of Apriori…

Databases · Computer Science 2017-02-22 Sudhakar Singh , Rakhi Garg , P. K. Mishra

Submodular Optimization in the MapReduce Model

Submodular optimization has received significant attention in both practice and theory, as a wide array of problems in machine learning, auction theory, and combinatorial optimization have submodular structure. In practice, these problems…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-04 Paul Liu , Jan Vondrak

Energy Efficient Scheduling of MapReduce Jobs

MapReduce is emerged as a prominent programming model for data-intensive computation. In this work, we study power-aware MapReduce scheduling in the speed scaling setting first introduced by Yao et al. [FOCS 1995]. We focus on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-02-13 Evripidis Bampis , Vincent Chau , Dimitrios Letsios , Giorgio Lucarelli , Ioannis Milis , Georgios Zois

An Open-Source Project for MapReduce Performance Self-Tuning

Many Hadoop configuration parameters have significant influence in the performance of running MapReduce jobs on Hadoop. It is time-consuming and tedious for general users to manually tune the parameters for optimal MapReduce performance.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-01 Donghua Chen

Dynamic Design of Machine Learning Pipelines via Metalearning

Automated machine learning (AutoML) has democratized the design of machine learning based systems, by automating model selection, hyperparameter tuning and feature engineering. However, the high computational cost associated with…

Machine Learning · Computer Science 2025-08-20 Edesio Alcobaça , André C. P. L. F. de Carvalho