Related papers: Collaborative Cluster Configuration for Distribute…

Towards Collaborative Optimization of Cluster Configurations for Distributed Dataflow Jobs

Analyzing large datasets with distributed dataflow systems requires the use of clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. However, picking the appropriate resources…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-28 Jonathan Will , Jonathan Bader , Lauritz Thamsen

C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds

Distributed dataflow systems enable data-parallel processing of large datasets on clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. Yet, selecting appropriate cloud…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-03 Jonathan Will , Lauritz Thamsen , Dominik Scheinert , Jonathan Bader , Odej Kao

Training Data Reduction for Performance Models of Data Analytics Jobs in the Cloud

Distributed dataflow systems like Apache Flink and Apache Spark simplify processing large amounts of data on clusters in a data-parallel manner. However, choosing suitable cluster resources for distributed dataflow jobs in both type and…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-14 Jonathan Will , Onur Arslan , Jonathan Bader , Dominik Scheinert , Lauritz Thamsen

Towards a Peer-to-Peer Data Distribution Layer for Efficient and Collaborative Resource Optimization of Distributed Dataflow Applications

Performance modeling can help to improve the resource efficiency of clusters and distributed dataflow applications, yet the available modeling data is often limited. Collaborative approaches to performance modeling, characterized by the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-24 Dominik Scheinert , Soeren Becker , Jonathan Will , Luis Englaender , Lauritz Thamsen

Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts

Distributed dataflow systems enable the use of clusters for scalable data analytics. However, selecting appropriate cluster resources for a processing job is often not straightforward. Performance models trained on historical executions of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-19 Dominik Scheinert , Lauritz Thamsen , Houkun Zhu , Jonathan Will , Alexander Acker , Thorsten Wittkopp , Odej Kao

Online Job Scheduling in Distributed Machine Learning Clusters

Nowadays large-scale distributed machine learning systems have been deployed to support various analytics and intelligence services in IT firms. To train a large dataset and derive the prediction/inference model, e.g., a deep neural…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-04 Yixin Bao , Yanghua Peng , Chuan Wu , Zongpeng Li

Performance modeling of a distributed file-system

Data centers have become center of big data processing. Most programs running in a data center processes big data. The storage requirements of such programs cannot be fulfilled by a single node in the data center, and hence a distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-28 Sandeep Kumar

On the Potential of Execution Traces for Batch Processing Workload Optimization in Public Clouds

With the growing amount of data, data processing workloads and the management of their resource usage becomes increasingly important. Since managing a dedicated infrastructure is in many situations infeasible or uneconomical, users…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-19 Dominik Scheinert , Alireza Alamgiralem , Jonathan Bader , Jonathan Will , Thorsten Wittkopp , Lauritz Thamsen

Selecting Efficient Cluster Resources for Data Analytics: When and How to Allocate for In-Memory Processing?

Distributed dataflow systems such as Apache Spark or Apache Flink enable parallel, in-memory data processing on large clusters of commodity hardware. Consequently, the appropriate amount of memory to allocate to the cluster is a crucial…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-08 Jonathan Will , Lauritz Thamsen , Dominik Scheinert , Odej Kao

Sequence-to-sequence models for workload interference

Co-scheduling of jobs in data-centers is a challenging scenario, where jobs can compete for resources yielding to severe slowdowns or failed executions. Efficient job placement on environments where resources are shared requires awareness…

Machine Learning · Computer Science 2020-07-07 David Buchaca Prats , Joan Marcual , Josep Lluís Berral , David Carrera

Trevor: Automatic configuration and scaling of stream processing pipelines

Operating a distributed data stream processing workload efficiently at scale is hard. The operator of the workload must parallelize and lay out tasks of the workload with resources that match the requirement of target data rate. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-27 Manu Bansal , Eyal Cidon , Arjun Balasingam , Aditya Gudipati , Christos Kozyrakis , Sachin Katti

Scheduling Jobs with Random Resource Requirements in Computing Clusters

We consider a natural scheduling problem which arises in many distributed computing frameworks. Jobs with diverse resource requirements (e.g. memory requirements) arrive over time and must be served by a cluster of servers, each with a…

Networking and Internet Architecture · Computer Science 2019-01-21 Konstantinos Psychas , Javad Ghaderi

Analysis of Distributed Algorithms for Big-data

The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-10 Rajendra Purohit , K R Chowdhary , S D Purohit

Methods for Partitioning Data to Improve Parallel Execution Time for Sorting on Heterogeneous Clusters

The aim of the paper is to introduce general techniques in order to optimize the parallel execution time of sorting on a distributed architectures with processors of various speeds. Such an application requires a partitioning step. For…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-16 Christophe Cérin , Jean-Christophe Dubacq , Jean-Louis Roch , the SafeScale Collaboration

Demeter: Resource-Efficient Distributed Stream Processing under Dynamic Loads with Multi-Configuration Optimization

Distributed Stream Processing (DSP) focuses on the near real-time processing of large streams of unbounded data. To increase processing capacities, DSP systems are able to dynamically scale across a cluster of commodity nodes, ensuring a…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-05 Morgan Geldenhuys , Dominik Scheinert , Odej Kao , Lauritz Thamsen

Runtime Variation in Big Data Analytics

The dynamic nature of resource allocation and runtime conditions on Cloud can result in high variability in a job's runtime across multiple iterations, leading to a poor experience. Identifying the sources of such variation and being able…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-10 Yiwen Zhu , Rathijit Sen , Robert Horton , John Mark , Agosta

Configurable Runtime Orchestration for Dynamic Data Retrieval in Distributed Systems

Modern enterprise platforms increasingly depend on distributed microservices, analytical data platforms, and external APIs to construct composite responses for applications. Orchestrating data retrieval across these heterogeneous systems is…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-11 Abhiram Kandiraju

Flora: Efficient Cloud Resource Selection for Big Data Processing via Job Classification

Distributed dataflow systems like Spark and Flink enable data-parallel processing of large datasets on clusters of cloud resources. Yet, selecting appropriate computational resources for dataflow jobs is often challenging. For efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-03 Jonathan Will , Lauritz Thamsen , Jonathan Bader , Odej Kao

A HPC Co-Scheduler with Reinforcement Learning

Although High Performance Computing (HPC) users understand basic resource requirements such as the number of CPUs and memory limits, internal infrastructural utilization data is exclusively leveraged by cluster operators, who use it to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-19 Abel Souza , Kristiaan Pelckmans , Johan Tordsson

Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers

In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizing the computing resources for distributed machine…

Machine Learning · Computer Science 2025-10-30 Mohammadreza Doostmohammadian , Zulfiya R. Gabidullina , Hamid R. Rabiee