English
Related papers

Related papers: Duration-Informed Workload Scheduler

200 papers

Failed workloads that consumed significant computational resources in time and space affect the efficiency of data centers significantly and thus limit the amount of scientific work that can be achieved. While the computational power has…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-13 Jie Li , Rui Wang , Ghazanfar Ali , Tommy Dang , Alan Sill , Yong Chen

Job schedulers are a key component of scalable computing infrastructures. They orchestrate all of the work executed on the computing infrastructure and directly impact the effectiveness of the system. Recently, job workloads have…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-06 Albert Reuther , Chansup Byun , William Arcand , David Bestor , Bill Bergeron , Matthew Hubbell , Michael Jones , Peter Michaleas , Andrew Prout , Antonio Rosa , Jeremy Kepner

Distributed cloud environments hosting data-intensive applications often experience slowdowns due to network congestion, asymmetric bandwidth, and inter-node data shuffling. These factors are typically not captured by traditional host-level…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-21 Sankalpa Timilsina , Susmit Shannigrahi

In modern computer systems, jobs are divided into short tasks and executed in parallel. Empirical observations in practical systems suggest that the task service times are highly random and the job service time is bottlenecked by the…

Performance · Computer Science 2017-02-08 Yin Sun , C. Emre Koksal , Ness B. Shroff

In the rapidly expanding field of parallel processing, job schedulers are the "operating systems" of modern big data architectures and supercomputing systems. Job schedulers allocate computing resources and control the execution of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-06 Albert Reuther , Chansup Byun , William Arcand , David Bestor , Bill Bergeron , Matthew Hubbell , Michael Jones , Peter Michaleas , Andrew Prout , Antonio Rosa , Jeremy Kepner

Motivated by the increasing importance of providing delay-guaranteed services in general computing and communication systems, and the recent wide adoption of learning and prediction in network control, in this work, we consider a general…

Networking and Internet Architecture · Computer Science 2018-01-08 Kun Chen , Longbo Huang

The demand for stringent interactive quality-of-service has intensified in both mobile edge computing (MEC) and cloud systems, driven by the imperative to improve user experiences. As a result, the processing of computation-intensive tasks…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-28 Ngoc Hung Nguyen , Van-Dinh Nguyen , Anh Tuan Nguyen , Nguyen Van Thieu , Hoang Nam Nguyen , Symeon Chatzinotas

Many hardware structures in today's high-performance out-of-order processors do not scale in an efficient way. To address this, different solutions have been proposed that build execution schedules in an energy-efficient manner. Issue time…

Hardware Architecture · Computer Science 2021-09-08 Andreas Diavastos , Trevor E. Carlson

Resource allocation in High Performance Computing (HPC) settings is still not easy for end-users due to the wide variety of application and environment configuration options. Users have difficulties to estimate the number of processors and…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-10 Eduardo R. Rodrigues , Renato L. F. Cunha , Marco A. S. Netto , Michael Spriggs

Large Language Model (LLM) workloads have distinct prefill and decode phases with different compute and memory requirements which should ideally be accounted for when scheduling input queries across different LLM instances in a cluster.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-08 Kunal Jain , Anjaly Parayil , Ankur Mallick , Esha Choukse , Xiaoting Qin , Jue Zhang , Íñigo Goiri , Rujia Wang , Chetan Bansal , Victor Rühle , Anoop Kulkarni , Steve Kofsky , Saravan Rajmohan

The paper presents an efficient real-time scheduling algorithm for intelligent real-time edge services, defined as those that perform machine intelligence tasks, such as voice recognition, LIDAR processing, or machine vision, on behalf of…

Machine Learning · Computer Science 2020-11-03 Shuochao Yao , Yifan Hao , Yiran Zhao , Huajie Shao , Dongxin Liu , Shengzhong Liu , Tianshi Wang , Jinyang Li , Tarek Abdelzaher

We present a scheduler that improves cluster utilization and job completion times by packing tasks having multi-resource requirements and inter-dependencies. While the problem is algorithmically very hard, we achieve near-optimality on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-26 Robert Grandl , Srikanth Kandula , Sriram Rao , Aditya Akella , Janardhan Kulkarni

Deep neural networks training jobs and other iterative computations frequently include checkpoints where jobs can be canceled based on the current value of monitored metrics. While most of existing results focus on the performance of all…

Performance · Computer Science 2022-09-30 Yuan Yao , Marco Paolieri , Leana Golubchik

The aim of this paper is to provide a description of deep-learning-based scheduling approach for academic-purpose high-performance computing systems. The share of academic-purpose distributed computing systems (DCS) reaches 17.4 percents…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-10-08 Andrey Gritsenko

Despite the fact that size-based schedulers can give excellent results in terms of both average response times and fairness, data-intensive computing execution engines generally do not employ size-based schedulers, mainly because of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-08-22 Matteo Dell'Amico

Multi-server queueing systems are widely used models for job scheduling in machine learning, wireless networks, crowdsourcing, and healthcare systems. This paper considers a multi-server system with multiple servers and multiple types of…

Machine Learning · Computer Science 2023-06-05 Zixian Yang , R. Srikant , Lei Ying

More and more companies have deployed machine learning (ML) clusters, where deep learning (DL) models are trained for providing various AI-driven services. Efficient resource scheduling is essential for maximal utilization of expensive DL…

Machine Learning · Computer Science 2019-09-16 Yanghua Peng , Yixin Bao , Yangrui Chen , Chuan Wu , Chen Meng , Wei Lin

Today high-performance computing (HPC) platforms are still dominated by batch jobs. Accordingly, effective batch job scheduling is crucial to obtain high system efficiency. Existing HPC batch job schedulers typically leverage heuristic…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-03 Di Zhang , Dong Dai , Youbiao He , Forrest Sheng Bao , Bing Xie

Co-scheduling of jobs in data-centers is a challenging scenario, where jobs can compete for resources yielding to severe slowdowns or failed executions. Efficient job placement on environments where resources are shared requires awareness…

Machine Learning · Computer Science 2020-07-07 David Buchaca Prats , Joan Marcual , Josep Lluís Berral , David Carrera

Diverse workloads such as interactive supercomputing, big data analysis, and large-scale AI algorithm development, requires a high-performance scheduler. This paper presents a novel node-based scheduling approach for large scale simulations…

‹ Prev 1 2 3 10 Next ›