English
Related papers

Related papers: Learning to Schedule: A Supervised Learning Framew…

200 papers

Recent years have witnessed a rapid growth of distributed machine learning (ML) frameworks, which exploit the massive parallelism of computing clusters to expedite ML training. However, the proliferation of distributed ML frameworks also…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-16 Menglu Yu , Jia Liu , Chuan Wu , Bo Ji , Elizabeth S. Bentley

Nowadays large-scale distributed machine learning systems have been deployed to support various analytics and intelligence services in IT firms. To train a large dataset and derive the prediction/inference model, e.g., a deep neural…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-04 Yixin Bao , Yanghua Peng , Chuan Wu , Zongpeng Li

A key functionality of emerging connected autonomous systems such as smart transportation systems, smart cities, and the industrial Internet-of-Things, is the ability to process and learn from data collected at different physical locations.…

Machine Learning · Computer Science 2021-01-26 Konstantinos Gatsis

We consider a natural scheduling problem which arises in many distributed computing frameworks. Jobs with diverse resource requirements (e.g. memory requirements) arrive over time and must be served by a cluster of servers, each with a…

Networking and Internet Architecture · Computer Science 2019-01-21 Konstantinos Psychas , Javad Ghaderi

Co-scheduling of jobs in data-centers is a challenging scenario, where jobs can compete for resources yielding to severe slowdowns or failed executions. Efficient job placement on environments where resources are shared requires awareness…

Machine Learning · Computer Science 2020-07-07 David Buchaca Prats , Joan Marcual , Josep Lluís Berral , David Carrera

This study presents a machine learning-assisted approach to optimize task scheduling in cluster systems, focusing on node-affinity constraints. Traditional schedulers like Kubernetes struggle with real-time adaptability, whereas the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-30 Leszek Sliwko , Jolanta Mizera-Pietraszko

More and more companies have deployed machine learning (ML) clusters, where deep learning (DL) models are trained for providing various AI-driven services. Efficient resource scheduling is essential for maximal utilization of expensive DL…

Machine Learning · Computer Science 2019-09-16 Yanghua Peng , Yixin Bao , Yangrui Chen , Chuan Wu , Chen Meng , Wei Lin

Efficient scheduling of distributed deep learning (DL) jobs in large GPU clusters is crucial for resource efficiency and job performance. While server sharing among jobs improves resource utilization, interference among co-located DL jobs…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-28 Xiaoyang Zhao , Chuan Wu

We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. Modern communication systems are becoming increasingly complex, and are required to handle multiple types of traffic with widely…

Machine Learning · Computer Science 2021-05-04 Mohammani Zaki , Avi Mohan , Aditya Gopalan , Shie Mannor

Energy consumption is one of the most critical concerns in designing computing devices, ranging from portable embedded systems to computer cluster systems. Furthermore, in the past decade, cluster systems have increasingly risen as popular…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-12 Amirhossein Esmaili , Massoud Pedram

With the rapid growth in computing power demand, cloud native networks have emerged as a promising solution to address the challenges of efficient resource coordination, particularly in coping with the dynamic fluctuations of network…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-11 Hao Jiang , Meng Qin , Ruijie Kuai , Dandan Liang , Yue Gao

Minimizing job scheduling time is a fundamental issue in data center networks that has been extensively studied in recent years. The incoming jobs require different CPU and memory units, and span different number of time slots. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-21 Weijia Chen , Yuedong Xu , Xiaofeng Wu

In Grids scheduling decisions are often made on the basis of jobs being either data or computation intensive: in data intensive situations jobs may be pushed to the data and in computation intensive situations data may be pulled to the…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-07-06 Richard McClatchey , Ashiq Anjum , Heinz Stockinger , Arshad Ali , Ian Willers , Michael Thomas

Distributed data processing systems like MapReduce, Spark, and Flink are popular tools for analysis of large datasets with cluster resources. Yet, users often overprovision resources for their data processing jobs, while the resource usage…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-02-16 Lauritz Thamsen , Ilya Verbitskiy , Sasho Nedelkoski , Vinh Thuy Tran , Vinicius Meyer , Miguel G. Xavier , Odej Kao , Cesar A. F. De Rose

Kubernetes (k8s) has the potential to coordinate distributed edge resources and centralized cloud resources, but currently lacks a specialized scheduling framework for edge-cloud networks. Besides, the hierarchical distribution of…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-11 Shihao Shen , Yiwen Han , Xiaofei Wang , Shiqiang Wang , Victor C. M. Leung

Multi-server queueing systems are widely used models for job scheduling in machine learning, wireless networks, crowdsourcing, and healthcare systems. This paper considers a multi-server system with multiple servers and multiple types of…

Machine Learning · Computer Science 2023-06-05 Zixian Yang , R. Srikant , Lei Ying

Computational Grid is enormous environments with heterogeneous resources and stable infrastructures among other Internet-based computing systems. However, the managing of resources in such systems has its special problems. Scheduler systems…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-05-07 Asgarali Bouyer , Mohammad Javad hoseyni , Abdul Hanan Abdullah

As human-robot collaboration increases in the workforce, it becomes essential for human-robot teams to coordinate efficiently and intuitively. Traditional approaches for human-robot scheduling either utilize exact methods that are…

Artificial Intelligence · Computer Science 2023-02-01 Batuhan Altundas , Zheyuan Wang , Joshua Bishop , Matthew Gombolay

Task graphs provide a simple way to describe scientific workflows (sets of tasks with dependencies) that can be executed on both HPC clusters and in the cloud. An important aspect of executing such graphs is the used scheduling algorithm.…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-04-18 Jakub Beránek , Stanislav Böhm , Vojtěch Cima

Scheduling deep learning (DL) models to train on powerful clusters with accelerators like GPUs and TPUs, presently falls short, either lacking fine-grained heterogeneity awareness or leaving resources substantially under-utilized. To fill…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-17 Abeda Sultana , Nabin Pakka , Fei Xu , Xu Yuan , Li Chen , Nian-Feng Tzeng
‹ Prev 1 2 3 10 Next ›