Related papers: Stubby: A Transformation-based Optimizer for MapRe…

Optimizing MapReduce for Highly Distributed Environments

MapReduce, the popular programming paradigm for large-scale data processing, has traditionally been deployed over tightly-coupled clusters where the data is already locally available. The assumption that the data and compute resources are…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-31 Benjamin Heintz , Abhishek Chandra , Ramesh K. Sitaraman

Optimization towards Efficiency and Stateful of dispel4py

Scientific workflows bridge scientific challenges with computational resources. While dispel4py, a stream-based workflow system, offers mappings to parallel enactment engines like MPI or Multiprocessing, its optimization primarily focuses…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-04 Liang Liang , Heting Zhang , Guang Yang , Thomas Heinis , Rosa Filgueira

MapReduce Scheduler: A 360-degree view

Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. Therefore, here are many scheduling algorithms to discuss…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-11 Rajdeep Das , Rohit Pratap Singh , Ripon Patgiri

Real-time topology optimization via learnable mappings

In traditional topology optimization, the computing time required to iteratively update the material distribution within a design domain strongly depends on the complexity or size of the problem, limiting its application in real engineering…

Computational Engineering, Finance, and Science · Computer Science 2024-05-14 Gabriel Garayalde , Matteo Torzoni , Matteo Bruggi , Alberto Corigliano

Improving the Load Balance of MapReduce Operations based on the Key Distribution of Pairs

Load balance is important for MapReduce to reduce job duration, increase parallel efficiency, etc. Previous work focuses on coarse-grained scheduling. This study concerns fine-grained scheduling on MapReduce operations. Each operation…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-04-15 Liya Fan , Bo Gao , Xi Sun , Fa Zhang , Zhiyong Liu

Scheduling Data Intensive Workloads through Virtualization on MapReduce based Clouds

MapReduce has become a popular programming model for running data intensive applications on the cloud. Completion time goals or deadlines of MapReduce jobs set by users are becoming crucial in existing cloud-based data processing…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-08-10 B. Thirumala Rao , L. S. S. Reddy

Scheduling MapReduce Jobs and Data Shuffle on Unrelated Processors

We propose constant approximation algorithms for generalizations of the Flexible Flow Shop (FFS) problem which form a realistic model for non-preemptive scheduling in MapReduce systems. Our results concern the minimization of the total…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-06-25 Dimitrios Fotakis , Ioannis Milis , Emmanouil Zampetakis , Georgios Zois

Effective Spatial Data Partitioning for Scalable Query Processing

Recently, MapReduce based spatial query systems have emerged as a cost effective and scalable solution to large scale spatial data processing and analytics. MapReduce based systems achieve massive scalability by partitioning the data and…

Databases · Computer Science 2015-09-04 Ablimit Aji , Vo Hoang , Fusheng Wang

Cost optimization of data flows based on task re-ordering

Analyzing big data in a highly dynamic environment becomes more and more critical because of the increasingly need for end-to-end processing of this data. Modern data flows are quite complex and there are not efficient, cost-based,…

Databases · Computer Science 2015-07-31 Georgia Kougka , Anastasios Gounaris

Resolvable Designs for Speeding up Distributed Computing

Distributed computing frameworks such as MapReduce are often used to process large computational jobs. They operate by partitioning each job into smaller tasks executed on different servers. The servers also need to exchange intermediate…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-20 Konstantinos Konstantinidis , Aditya Ramamoorthy

Assignment Problems of Different-Sized Inputs in MapReduce

A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of reducers, such that for each required output there exists a reducer that receives all the inputs that participate in the computation of this…

Databases · Computer Science 2016-10-21 Foto Afrati , Shlomi Dolev , Ephraim Korach , Shantanu Sharma , Jeffrey D. Ullman

Assignment of Different-Sized Inputs in MapReduce

A MapReduce algorithm can be described by a mapping schema, which assigns inputs to a set of reducers, such that for each required output there exists a reducer that receives all the inputs that participate in the computation of this…

Databases · Computer Science 2015-01-28 Foto Afrati , Shlomi Dolev , Ephraim Korach , Shantanu Sharma , Jeffrey D. Ullman

Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads

Within the past few years, organizations in diverse industries have adopted MapReduce-based systems for large-scale data processing. Along with these new users, important new workloads have emerged which feature many small, short, and…

Databases · Computer Science 2012-08-22 Yanpei Chen , Sara Alspaugh , Randy Katz

A machine learning based algorithm selection method to solve the minimum cost flow problem

The minimum cost flow problem is one of the most studied network optimization problems and appears in numerous applications. Some efficient algorithms exist for this problem, which are freely available in the form of libraries or software…

Machine Learning · Computer Science 2022-10-06 Philipp Herrmann , Anna Meyer , Stefan Ruzika , Luca E. Schäfer , Fabian von der Warth

Automatic Optimization for MapReduce Programs

The MapReduce distributed programming framework has become popular, despite evidence that current implementations are inefficient, requiring far more hardware than a traditional relational databases to complete similar tasks. MapReduce jobs…

Databases · Computer Science 2011-04-19 Eaman Jahani , Michael J. Cafarella , Christopher Ré

Energy-Efficient Edge-Facilitated Wireless Collaborative Computing using Map-Reduce

In this work, a heterogeneous set of wireless devices sharing a common access point collaborates to perform a set of tasks. Using the Map-Reduce distributed computing framework, the tasks are optimally distributed amongst the nodes with the…

Signal Processing · Electrical Eng. & Systems 2019-03-07 Antoine Paris , Hamed Mirghasemi , Ivan Stupia , Luc Vandendorpe

The Many Faces of Data-centric Workflow Optimization: A Survey

Workflow technology is rapidly evolving and, rather than being limited to modeling the control flow in business processes, is becoming a key mechanism to perform advanced data management, such as big data analytics. This survey focuses on…

Databases · Computer Science 2017-01-27 Georgia Kougka , Anastasios Gounaris , Alkis Simitsis

Scalable Data Cube Analysis over Big Data

Data cubes are widely used as a powerful tool to provide multidimensional views in data warehousing and On-Line Analytical Processing (OLAP). However, with increasing data sizes, it is becoming computationally expensive to perform data cube…

Databases · Computer Science 2013-11-25 Zhengkui Wang , Yan Chu , Kian-Lee Tan , Divyakant Agrawal , Amr EI Abbadi , Xiaolong Xu

A Review of Tools and Techniques for Optimization of Workload Mapping and Scheduling in Heterogeneous HPC System

This paper presents a systematic review of mapping and scheduling strategies within the High-Performance Computing (HPC) compute continuum, with a particular emphasis on heterogeneous systems. It introduces a prototype workflow to establish…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-19 Aasish Kumar Sharma , Julian Kunkel

FasterPy: An LLM-based Code Execution Efficiency Optimization Framework

Code often suffers from performance bugs. These bugs necessitate the research and practice of code optimization. Traditional rule-based methods rely on manually designing and maintaining rules for specific performance bugs (e.g., redundant…

Software Engineering · Computer Science 2025-12-30 Yue Wu , Minghao Han , Ruiyin Li , Peng Liang , Amjed Tahir , Zengyang Li , Qiong Feng , Mojtaba Shahin