English
Related papers

Related papers: Flora: Efficient Cloud Resource Selection for Big …

200 papers

Distributed dataflow systems enable data-parallel processing of large datasets on clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. Yet, selecting appropriate cloud…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-03 Jonathan Will , Lauritz Thamsen , Dominik Scheinert , Jonathan Bader , Odej Kao

Distributed dataflow systems like Spark and Flink enable data-parallel processing of large datasets on clusters. Yet, selecting appropriate computational resources for dataflow jobs is often challenging. For efficient execution, individual…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-27 Jonathan Will , Nico Treide , Lauritz Thamsen , Odej Kao

Distributed dataflow systems like Apache Spark and Apache Hadoop enable data-parallel processing of large datasets on clusters. Yet, selecting appropriate computational resources for dataflow jobs -- that neither lead to bottlenecks nor to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-11 Jonathan Will , Lauritz Thamsen , Jonathan Bader , Dominik Scheinert , Odej Kao

Analyzing large datasets with distributed dataflow systems requires the use of clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. However, picking the appropriate resources…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-28 Jonathan Will , Jonathan Bader , Lauritz Thamsen

Distributed dataflow systems such as Apache Spark or Apache Flink enable parallel, in-memory data processing on large clusters of commodity hardware. Consequently, the appropriate amount of memory to allocate to the cluster is a crucial…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-08 Jonathan Will , Lauritz Thamsen , Dominik Scheinert , Odej Kao

Distributed dataflow systems like Apache Flink and Apache Spark simplify processing large amounts of data on clusters in a data-parallel manner. However, choosing suitable cluster resources for distributed dataflow jobs in both type and…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-14 Jonathan Will , Onur Arslan , Jonathan Bader , Dominik Scheinert , Lauritz Thamsen

Distributed data processing systems like MapReduce, Spark, and Flink are popular tools for analysis of large datasets with cluster resources. Yet, users often overprovision resources for their data processing jobs, while the resource usage…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-16 Lauritz Thamsen , Ilya Verbitskiy , Sasho Nedelkoski , Vinh Thuy Tran , Vinicius Meyer , Miguel G. Xavier , Odej Kao , Cesar A. F. De Rose

Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-02 Lauritz Thamsen , Dominik Scheinert , Jonathan Will , Jonathan Bader , Odej Kao

Distributed Data Processing Platforms (e.g., Hadoop, Spark, and Flink) are widely used to store and process data in a cloud environment. These platforms distribute the storage and processing of data among the computing nodes of a cloud. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-12-08 Isuru Dharmadasa , Faheem Ullah

Selecting appropriate computational resources for data processing jobs on large clusters is difficult, even for expert users like data engineers. Inadequate choices can result in vastly increased costs, without significantly improving…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-06 Jonathan Will , Lauritz Thamsen , Jonathan Bader , Dominik Scheinert , Odej Kao

Efficient resource allocation is a key challenge in modern cloud computing. Over-provisioning leads to unnecessary costs, while under-provisioning risks performance degradation and SLA violations. This work presents an artificial…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-08 Harshit Goyal

Distributed data processing frameworks (e.g., Hadoop, Spark, and Flink) are widely used to distribute data among computing nodes of a cloud. Recently, there have been increasing efforts aimed at evaluating the performance of distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-07 Faheem Ullah , Shagun Dhingra , Xiaoyu Xia , M. Ali Babar

Distributed computing, such as cloud computing, provides promising platforms to execute multiple workflows. Workflow scheduling plays an important role in multi-workflow execution with multi-objective requirements. Although there exist many…

Artificial Intelligence · Computer Science 2022-05-24 Feng Li , Wen Jun , Tan , Wentong , Cai

Selecting the right resources for big data analytics jobs is hard because of the wide variety of configuration options like machine type and cluster size. As poor choices can have a significant impact on resource efficiency, cost, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-27 Dominik Scheinert , Philipp Wiesner , Thorsten Wittkopp , Lauritz Thamsen , Jonathan Will , Odej Kao

Training and deploying deep learning models in real-world applications require processing large amounts of data. This is a challenging task when the amount of data grows to a hundred terabytes, or even, petabyte-scale. We introduce a hybrid…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-17 Davit Buniatyan

Big data processing applications are becoming more and more complex. They are no more monolithic in nature but instead they are composed of decoupled analytical processes in the form of a workflow. One type of such workflow applications is…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-19 Mutaz Barika , Saurabh Garg , Andrew Chan , Rodrigo N. Calheiros

Distributed in-memory data processing engines accelerate iterative applications by caching substantial datasets in memory rather than recomputing them in each iteration. Selecting a suitable cluster size for caching these datasets plays an…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-07 Hani Al-Sayeh , Muhammad Attahir Jibril , Bunjamin Memishi , Kai-Uwe Sattler

Distributed data processing platforms (e.g., Hadoop, Spark, and Flink) are widely used to distribute the storage and processing of data among computing nodes of a cloud. The centralization of cloud resources has given birth to edge…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-01-07 Faheem Ullah , Imaduddin Mohammed , M. Ali Babar

In Cloud Computing, the resource provisioning approach used has a great impact on the processing cost, especially when it is used for Big Data processing. Due to data variety, the performance of virtual machines (VM) may differ based on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-12 Hossein Ahmadvand , Fouzhan Foroutan

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-17 Claudia Misale , Maurizio Drocco , Marco Aldinucci , Guy Tremblay
‹ Prev 1 2 3 10 Next ›