English
Related papers

Related papers: Sequence-to-sequence models for workload interfere…

200 papers

Distributed cloud environments hosting data-intensive applications often experience slowdowns due to network congestion, asymmetric bandwidth, and inter-node data shuffling. These factors are typically not captured by traditional host-level…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-21 Sankalpa Timilsina , Susmit Shannigrahi

Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-02 Lauritz Thamsen , Dominik Scheinert , Jonathan Will , Jonathan Bader , Odej Kao

Although High Performance Computing (HPC) users understand basic resource requirements such as the number of CPUs and memory limits, internal infrastructural utilization data is exclusively leveraged by cluster operators, who use it to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-19 Abel Souza , Kristiaan Pelckmans , Johan Tordsson

This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several…

Data Structures and Algorithms · Computer Science 2013-05-01 Guillaume Aupy , Manu Shantharam , Anne Benoit , Yves Robert , Padma Raghavan

We consider a natural scheduling problem which arises in many distributed computing frameworks. Jobs with diverse resource requirements (e.g. memory requirements) arrive over time and must be served by a cluster of servers, each with a…

Networking and Internet Architecture · Computer Science 2019-01-21 Konstantinos Psychas , Javad Ghaderi

Job submissions of parallel applications to production supercomputer systems will have to be carefully tuned in terms of the job submission parameters to obtain minimum response times. In this work, we have developed an end-to-end resource…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-20 Swetha Hariharan , Prakash Murali , Abhishek Pasari , Sathish Vadhiyar

Job scheduling is a well-known Combinatorial Optimization problem with endless applications. Well planned schedules bring many benefits in the context of automated systems: among others, they limit production costs and waste. Nevertheless,…

Artificial Intelligence · Computer Science 2023-08-04 Giovanni Bonetta , Davide Zago , Rossella Cancelliere , Andrea Grosso

Increasing data volumes in scientific experiments necessitate the use of high-performance computing (HPC) resources for data analysis. In many scientific fields, the data generated from scientific instruments and supercomputer simulations…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-25 Sam Nickolay , Eun-Sung Jung , Rajkumar Kettimuthu , Ian Foster

In this paper, a method for efficient scheduling to obtain optimum job throughput in a distributed campus grid environment is presented; Traditional job schedulers determine job scheduling using user and job resource attributes. User…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-07-15 Srirangam V Addepallil , Per Andersen , George L Barnes

Molecular dynamics (MD) simulations are widely used to study large-scale molecular systems. HPC systems are ideal platforms to run these studies, however, reaching the necessary simulation timescale to detect rare processes is challenging,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-22 Tu Mai Anh Do , Loïc Pottier , Rafael Ferreira da Silva , Frédéric Suter , Silvina Caíno-Lores , Michela Taufer , Ewa Deelman

Minimizing job scheduling time is a fundamental issue in data center networks that has been extensively studied in recent years. The incoming jobs require different CPU and memory units, and span different number of time slots. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-21 Weijia Chen , Yuedong Xu , Xiaofeng Wu

Scheduling a set of jobs over a collection of machines is a fundamental problem that needs to be solved millions of times a day in various computing platforms: in operating systems, in large data clusters, and in data centers. Along with…

Data Structures and Algorithms · Computer Science 2018-07-10 Janardhan Kulkarni , Shi Li

Failed workloads that consumed significant computational resources in time and space affect the efficiency of data centers significantly and thus limit the amount of scientific work that can be achieved. While the computational power has…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-13 Jie Li , Rui Wang , Ghazanfar Ali , Tommy Dang , Alan Sill , Yong Chen

Deep neural networks training jobs and other iterative computations frequently include checkpoints where jobs can be canceled based on the current value of monitored metrics. While most of existing results focus on the performance of all…

Performance · Computer Science 2022-09-30 Yuan Yao , Marco Paolieri , Leana Golubchik

The under exploitation of the available resources risks to be one of the main problems for a computing center. The growing demand of computational power necessarily entails more complex approaches in the management of the computing…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-05-01 Federico Calzolari , Silvia Volpe

A queue is required when a service provider is not able to handle jobs arriving over the time. In a highly flexible and dynamic environment, some jobs might demand for faster execution at run-time especially when the resources are limited…

Performance · Computer Science 2015-03-24 Yash Gupta , Kamalakar Karlapalem

Task graphs provide a simple way to describe scientific workflows (sets of tasks with dependencies) that can be executed on both HPC clusters and in the cloud. An important aspect of executing such graphs is the used scheduling algorithm.…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-18 Jakub Beránek , Stanislav Böhm , Vojtěch Cima

To extract value from evergrowing volumes of data, coming from a number of different sources, and to drive decision making, organizations frequently resort to the composition of data processing workflows, since they are expressive,…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-13 Sérgio Esteves , Helena Galhardas , Luís Veiga

Motivated by modern parallel computing applications, we consider the problem of scheduling parallel-task jobs with heterogeneous resource requirements in a cluster of machines. Each job consists of a set of tasks that can be processed in…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-03 Mehrnoosh Shafiee , Javad Ghaderi

Analyzing large datasets with distributed dataflow systems requires the use of clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. However, picking the appropriate resources…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-28 Jonathan Will , Jonathan Bader , Lauritz Thamsen
‹ Prev 1 2 3 10 Next ›