Related papers: Scheduling Real-time Deep Learning Services as Imp…

Scheduling Techniques of AI Models on Modern Heterogeneous Edge GPU -- A Critical Review

In recent years, the development of specialized edge computing devices has significantly increased, driven by the growing demand for AI models. These devices, such as the NVIDIA Jetson series, must efficiently handle increased data…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-03 Ashiyana Abdul Majeed , Mahmoud Meribout

Value Function Based Performance Optimization of Deep Learning Workloads

As machine learning techniques become ubiquitous, the efficiency of neural network implementations is becoming correspondingly paramount. Frameworks, such as Halide and TVM, separate out the algorithmic representation of the network from…

Machine Learning · Computer Science 2020-12-01 Benoit Steiner , Chris Cummins , Horace He , Hugh Leather

DeepRT: A Soft Real Time Scheduler for Computer Vision Applications on the Edge

The ubiquity of smartphone cameras and IoT cameras, together with the recent boom of deep learning and deep neural networks, proliferate various computer vision driven mobile and IoT applications deployed on the edge. This paper focuses on…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-06 Zhe Yang , Klara Nahrstedt , Hongpeng Guo , Qian Zhou

EdgeServing: Deadline-Aware Multi-DNN Serving at the Edge

As edge computing expands, serving multiple deep neural network (DNN) models on a single shared GPU has become a common yet challenging scenario, where each scheduling decision affects the tail latency of all concurrent queues. Existing…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-08 Jiahe Cao , Xiaomeng Li , Qiang Liu , Tao Han , Ning Zhang , Weisong Shi

Edge Learning with Timeliness Constraints: Challenges and Solutions

Future machine learning (ML) powered applications, such as autonomous driving and augmented reality, involve training and inference tasks with timeliness requirements and are communication and computation intensive, which demands for the…

Networking and Internet Architecture · Computer Science 2020-09-24 Yuxuan Sun , Wenqi Shi , Xiufeng Huang , Sheng Zhou , Zhisheng Niu

Enhancing Kubernetes Automated Scheduling with Deep Learning and Reinforcement Techniques for Large-Scale Cloud Computing Optimization

With the continuous expansion of the scale of cloud computing applications, artificial intelligence technologies such as Deep Learning and Reinforcement Learning have gradually become the key tools to solve the automated task scheduling of…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-14 Zheng Xu , Yulu Gong , Yanlin Zhou , Qiaozhi Bao , Wenpin Qian

Deadline-Aware Joint Task Scheduling and Offloading in Mobile Edge Computing Systems

The demand for stringent interactive quality-of-service has intensified in both mobile edge computing (MEC) and cloud systems, driven by the imperative to improve user experiences. As a result, the processing of computation-intensive tasks…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-28 Ngoc Hung Nguyen , Van-Dinh Nguyen , Anh Tuan Nguyen , Nguyen Van Thieu , Hoang Nam Nguyen , Symeon Chatzinotas

Scheduling Splittable Jobs on Configurable Machines

Motivated by deep neural network applications, we study the problem of scheduling splittable jobs (e.g., neural network inference tasks) on configurable machines (e.g., multi-instance GPUs). We are given $n$ jobs and a set $C$ of…

Data Structures and Algorithms · Computer Science 2023-12-12 Matthew Casey , Rajmohan Rajaraman , David Stalfa

Learning to Schedule: A Supervised Learning Framework for Network-Aware Scheduling of Data-Intensive Workloads

Distributed cloud environments hosting data-intensive applications often experience slowdowns due to network congestion, asymmetric bandwidth, and inter-node data shuffling. These factors are typically not captured by traditional host-level…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-21 Sankalpa Timilsina , Susmit Shannigrahi

Neural Heterogeneous Scheduler

Access to parallel and distributed computation has enabled researchers and developers to improve algorithms and performance in many applications. Recent research has focused on next generation special purpose systems with multiple kinds of…

Machine Learning · Computer Science 2019-06-11 Tegg Taekyong Sung , Valliappa Chockalingam , Alex Yahja , Bo Ryu

Deep Reinforcement Learning for Multi-Resource Multi-Machine Job Scheduling

Minimizing job scheduling time is a fundamental issue in data center networks that has been extensively studied in recent years. The incoming jobs require different CPU and memory units, and span different number of time slots. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-21 Weijia Chen , Yuedong Xu , Xiaofeng Wu

A Deep Reinforcement Learning Approach to Multi-component Job Scheduling in Edge Computing

We are interested in the optimal scheduling of a collection of multi-component application jobs in an edge computing system that consists of geo-distributed edge computing nodes connected through a wide area network. The scheduling and…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-24 Zhi Cao , Honggang Zhang , Yu Cao , Benyuan Liu

Efficient and Robust Machine Learning for Real-World Systems

While machine learning is traditionally a resource intensive task, embedded systems, autonomous navigation and the vision of the Internet-of-Things fuel the interest in resource efficient approaches. These approaches require a carefully…

Machine Learning · Computer Science 2018-12-07 Franz Pernkopf , Wolfgang Roth , Matthias Zoehrer , Lukas Pfeifenberger , Guenther Schindler , Holger Froening , Sebastian Tschiatschek , Robert Peharz , Matthew Mattina , Zoubin Ghahramani

Deep Learning for Unrelated-Machines Scheduling: Handling Variable Dimensions

Deep learning has been effectively applied to many discrete optimization problems. However, learning-based scheduling on unrelated parallel machines remains particularly difficult to design. Not only do the numbers of jobs and machines…

Machine Learning · Computer Science 2025-12-23 Diego Hitzges , Guillaume Sagnol

A Dynamic Distributed Scheduler for Computing on the Edge

Edge computing has become a promising computing paradigm for building IoT (Internet of Things) applications, particularly for applications with specific constraints such as latency or privacy requirements. Due to resource constraints at the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-15 Fei Hu , Kunal Mehta , Shivakant Mishra , Mohammad AlMutawa

Scheduling Inference Workloads on Distributed Edge Clusters with Reinforcement Learning

Many real-time applications (e.g., Augmented/Virtual Reality, cognitive assistance) rely on Deep Neural Networks (DNNs) to process inference tasks. Edge computing is considered a key infrastructure to deploy such applications, as moving…

Machine Learning · Computer Science 2023-02-01 Gabriele Castellano , Juan-José Nieto , Jordi Luque , Ferrán Diego , Carlos Segura , Diego Perino , Flavio Esposito , Fulvio Risso , Aravindh Raman

BCEdge: SLO-Aware DNN Inference Services with Adaptive Batching on Edge Platforms

As deep neural networks (DNNs) are being applied to a wide range of edge intelligent applications, it is critical for edge inference platforms to have both high-throughput and low-latency at the same time. Such edge platforms with multiple…

Machine Learning · Computer Science 2023-05-03 Ziyang Zhang , Huan Li , Yang Zhao , Changyao Lin , Jie Liu

Understanding Auto-Scheduling Optimizations for Model Deployment via Visualizations

After completing the design and training phases, deploying a deep learning model onto specific hardware is essential before practical implementation. Targeted optimizations are necessary to enhance the model's performance by reducing…

Human-Computer Interaction · Computer Science 2023-08-10 Laixin Xie , Chenyang Zhang , Ruofei Ma , Xing Jiang , Xingxing Xing , Wei Wan , Quan Li

Interference-Aware Edge Runtime Prediction with Conformal Matrix Completion

Accurately estimating workload runtime is a longstanding goal in computer systems, and plays a key role in efficient resource provisioning, latency minimization, and various other system management tasks. Runtime prediction is particularly…

Machine Learning · Computer Science 2025-03-11 Tianshu Huang , Arjun Ramesh , Emily Ruppel , Nuno Pereira , Anthony Rowe , Carlee Joe-Wong

Edge-device Collaborative Computing for Multi-view Classification

Motivated by the proliferation of Internet-of-Thing (IoT) devices and the rapid advances in the field of deep learning, there is a growing interest in pushing deep learning computations, conventionally handled by the cloud, to the edge of…

Machine Learning · Computer Science 2024-09-25 Marco Palena , Tania Cerquitelli , Carla Fabiana Chiasserini