English
Related papers

Related papers: Bioinformatics Computational Cluster Batch Task Pr…

200 papers

We address the problem of predicting whether sufficient memory and CPU resources have been requested for jobs at submission time. For this purpose, we examine the task of training a supervised machine learning system to predict the outcome…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-05 Dan Andresen , William Hsu , Huichen Yang , Adedolapo Okanlawon

Resource allocation in High Performance Computing (HPC) settings is still not easy for end-users due to the wide variety of application and environment configuration options. Users have difficulties to estimate the number of processors and…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-10 Eduardo R. Rodrigues , Renato L. F. Cunha , Marco A. S. Netto , Michael Spriggs

Although High Performance Computing (HPC) users understand basic resource requirements such as the number of CPUs and memory limits, internal infrastructural utilization data is exclusively leveraged by cluster operators, who use it to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-19 Abel Souza , Kristiaan Pelckmans , Johan Tordsson

We consider a natural scheduling problem which arises in many distributed computing frameworks. Jobs with diverse resource requirements (e.g. memory requirements) arrive over time and must be served by a cluster of servers, each with a…

Networking and Internet Architecture · Computer Science 2019-01-21 Konstantinos Psychas , Javad Ghaderi

Scientific workflow management systems support large-scale data analysis on cluster infrastructures. For this, they interact with resource managers which schedule workflow tasks onto cluster nodes. In addition to workflow task descriptions,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-30 Jonathan Bader , Kathleen West , Soeren Becker , Svetlana Kulagina , Fabian Lehmann , Lauritz Thamsen , Henning Meyerhenke , Odej Kao

In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizing the computing resources for distributed machine…

Machine Learning · Computer Science 2025-10-30 Mohammadreza Doostmohammadian , Zulfiya R. Gabidullina , Hamid R. Rabiee

Failed workloads that consumed significant computational resources in time and space affect the efficiency of data centers significantly and thus limit the amount of scientific work that can be achieved. While the computational power has…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-13 Jie Li , Rui Wang , Ghazanfar Ali , Tommy Dang , Alan Sill , Yong Chen

This study presents a machine learning-assisted approach to optimize task scheduling in cluster systems, focusing on node-affinity constraints. Traditional schedulers like Kubernetes struggle with real-time adaptability, whereas the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-30 Leszek Sliwko , Jolanta Mizera-Pietraszko

We present a scheduler that improves cluster utilization and job completion times by packing tasks having multi-resource requirements and inter-dependencies. While the problem is algorithmically very hard, we achieve near-optimality on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-26 Robert Grandl , Srikanth Kandula , Sriram Rao , Aditya Akella , Janardhan Kulkarni

Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-02 Lauritz Thamsen , Dominik Scheinert , Jonathan Will , Jonathan Bader , Odej Kao

Reliability is a fundamental challenge in operating large-scale machine learning (ML) infrastructures, particularly as the scale of ML models and training clusters continues to grow. Despite decades of research on infrastructure failures,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-10 Apostolos Kokolis , Michael Kuchnik , John Hoffman , Adithya Kumar , Parth Malani , Faye Ma , Zachary DeVito , Shubho Sengupta , Kalyan Saladi , Carole-Jean Wu

Nowadays large-scale distributed machine learning systems have been deployed to support various analytics and intelligence services in IT firms. To train a large dataset and derive the prediction/inference model, e.g., a deep neural…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-04 Yixin Bao , Yanghua Peng , Chuan Wu , Zongpeng Li

The operating system's role in a computer system is to manage the various resources. One of these resources is the Central Processing Unit. It is managed by a component of the operating system called the CPU scheduler. Schedulers are…

Operating Systems · Computer Science 2010-11-09 George Anderson , Tshilidzi Marwala , Fulufhelo V. Nelwamondo

Background: Large-scale biological jobs on high-performance computing systems require manual intervention if one or more computing cores on which they execute fail. This places not only a cost on the maintenance of the job, but also a cost…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-03-04 Blesson Varghese , Gerard McKee , Vassil Alexandrov

We consider a parallel system of $m$ identical machines prone to unpredictable crashes and restarts, trying to cope with the continuous arrival of tasks to be executed. Tasks have different computational requirements (i.e., processing time…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-21 Elli Zavou , Antonio Fernández Anta

This paper presents the Container Profiler, a software tool that measures and records the resource usage of any containerized task. Our tool profiles the CPU, memory, disk, and network utilization of containerized tasks collecting over…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-08 Varik Hoang , Ling-Hong Hung , David Perez , Huazeng Deng , Raymond Schooley , Niharika Arumilli , Ka Yee Yeung , Wes Lloyd

Recommendation algorithms perform differently if the users, recommendation contexts, applications, and user interfaces vary even slightly. It is similarly observed in other fields, such as combinatorial problem solving, that algorithms…

Information Retrieval · Computer Science 2021-01-01 Andrew Collins , Laura Tierney , Joeran Beel

Many algorithms in workflow scheduling and resource provisioning rely on the performance estimation of tasks to produce a scheduling plan. A profiler that is capable of modeling the execution of tasks and predicting their runtime…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-01 Muhammad H. Hilman , Maria A. Rodriguez , Rajkumar Buyya

With the rapid growth of the data volume and the fast increasing of the computational model complexity in the scenario of cloud computing, it becomes an important topic that how to handle users' requests by scheduling computational jobs and…

Machine Learning · Computer Science 2021-05-10 Zheqi Zhu , Pingyi Fan

Scheduling of constrained deadline sporadic task systems on multiprocessor platforms is an area which has received much attention in the recent past. It is widely believed that finding an optimal scheduler is hard, and therefore most…

Operating Systems · Computer Science 2020-04-07 Arvind Easwaran , Insik Shin , Insup Lee
‹ Prev 1 2 3 10 Next ›