Related papers: PREMA: A Predictive Multi-task Scheduling Algorith…

Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs

With deep neural networks (DNNs) emerging as the backbone in a multitude of computer vision tasks, their adoption in real-world applications broadens continuously. Given the abundance and omnipresence of smart devices in the consumer…

Machine Learning · Computer Science 2023-08-08 Alexandros Kouris , Stylianos I. Venieris , Stefanos Laskaridis , Nicholas D. Lane

On Non-Preemptive VM Scheduling in the Cloud

We study the problem of scheduling VMs (Virtual Machines) in a distributed server platform, motivated by cloud computing applications. The VMs arrive dynamically over time to the system, and require a certain amount of resources (e.g.…

Networking and Internet Architecture · Computer Science 2018-07-04 Konstantinos Psychas , Javad Ghaderi

Task-based preemptive scheduling on FPGAs leveraging partial reconfiguration

FPGAs are an attractive type of accelerator for all-purpose HPC computing systems due to the possibility of deploying tailored hardware on demand. However, the common tools for programming and operating FPGAs are still complex to use,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-19 Gabriel Rodriguez-Canal , Nick Brown , Yuri Torres , Arturo Gonzalez-Escribano

Preemption Aware Task Scheduling for Priority and Deadline Constrained DNN Inference Task Offloading in Homogeneous Mobile-Edge Networks

This paper addresses the computational offloading of Deep Neural Networks (DNNs) to nearby devices with similar processing capabilities, to avoid the larger communication delays incurred for cloud offloading. We present a preemption aware…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-24 Jamie Cotter , Ignacio Castineiras , Donna O'Shea , Victor Cionca

Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference

Large Language Models have revolutionized natural language processing, yet serving them efficiently in data centers remains challenging due to mixed workloads comprising latency-sensitive (LS) and best-effort (BE) jobs. Existing inference…

Machine Learning · Computer Science 2025-03-13 Mohammad Siavashi , Faezeh Keshmiri Dindarloo , Dejan Kostic , Marco Chiesa

Automated Runtime-Aware Scheduling for Multi-Tenant DNN Inference on GPU

With the fast development of deep neural networks (DNNs), many real-world applications are adopting multiple models to conduct compound tasks, such as co-running classification, detection, and segmentation models on autonomous vehicles.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-30 Fuxun Yu , Shawn Bray , Di Wang , Longfei Shangguan , Xulong Tang , Chenchen Liu , Xiang Chen

Online Non-preemptive Scheduling on Unrelated Machines with Rejections

When a computer system schedules jobs there is typically a significant cost associated with preempting a job during execution. This cost can be from the expensive task of saving the memory's state and loading data into and out of memory. It…

Data Structures and Algorithms · Computer Science 2018-03-01 Giorgio Lucarelli , Benjamin Moseley , Nguyen Kim Thang , Abhinav Srivastav , Denis Trystram

Profitable Scheduling on Multiple Speed-Scalable Processors

We present a new online algorithm for profit-oriented scheduling on multiple speed-scalable processors. Moreover, we provide a tight analysis of the algorithm's competitiveness. Our results generalize and improve upon work by…

Data Structures and Algorithms · Computer Science 2012-09-19 Peter Kling , Peter Pietrzyk

An efficient cloud scheduler design supporting preemptible instances

Maximizing resource utilization by performing an efficient resource provisioning is a key factor for any cloud provider: commercial actors can maximize their revenues, whereas scientific and non-commercial providers can maximize their…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-29 Álvaro López García , Enol Fernández-del-Castillo , Isabel Campos Plasencia

D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs

Hardware accelerators such as GPUs are required for real-time, low-latency inference with Deep Neural Networks (DNN). However, due to the inherent limits to the parallelism they can exploit, DNNs often under-utilize the capacity of today's…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-27 Aditya Dhakal , Sameer G. Kulkarni , K. K. Ramakrishnan

Learning-Augmented Online Scheduling with Parsimonious Preemption

Learning-augmented algorithms have emerged as a powerful paradigm to surpass traditional worst-case lower bounds by integrating potentially noisy predictions. While this framework has seen success in online scheduling, existing work…

Machine Learning · Computer Science 2026-05-25 Mugen Blue , Sungjin Im , Alexander Lindermayr

Flex-TPU: A Flexible TPU with Runtime Reconfigurable Dataflow Architecture

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well as in tiny ML applications. TPUs offer several improvements and advantages over conventional ML…

Hardware Architecture · Computer Science 2024-07-12 Mohammed Elbtity , Peyton Chandarana , Ramtin Zand

An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Ensembles of Deep Neural Networks (DNNs) have achieved qualitative predictions but they are computing and memory intensive. Therefore, the demand is growing to make them answer a heavy workload of requests with available computational…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-31 Pierrick Pochelu , Serge G. Petiton , Bruno Conche

Dynamic Ready Queue Based Process Priority Scheduling Algorithm

CPU scheduling is the reason behind the performance of multiprocessing and in time-shared operating systems. Different scheduling criteria are used to evaluate Central Processing Unit Scheduling algorithms which are based on different…

Operating Systems · Computer Science 2022-05-17 Raghav Dalmia , Aryaman Sinha , Ruchi Verma , P. K. Gupta

Throughput Maximization of DNN Inference: Batching or Multi-Tenancy?

Deployment of real-time ML services on warehouse-scale infrastructures is on the increase. Therefore, decreasing latency and increasing throughput of deep neural network (DNN) inference applications that empower those services have…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-29 Seyed Morteza Nabavinejad , Masoumeh Ebrahimi , Sherief Reda

An Optimal Real-Time Scheduling Approach: From Multiprocessor to Uniprocessor

An optimal solution to the problem of scheduling real-time tasks on a set of identical processors is derived. The described approach is based on solving an equivalent uniprocessor real-time scheduling problem. Although there are other…

Operating Systems · Computer Science 2011-04-19 Paul Regnier , George Lima , Ernesto Massa

Improved Approximation Algorithms for Non-Preemptive Throughput Maximization

The (Non-Preemptive) Throughput Maximization problem is a natural and fundamental scheduling problem. We are given $n$ jobs, where each job $j$ is characterized by a processing time and a time window, contained in a global interval $[0,T)$,…

Data Structures and Algorithms · Computer Science 2026-04-01 Alexander Armbruster , Fabrizio Grandoni , Antoine Tinguely , Andreas Wiese

Energy and Time Efficient Scheduling of Tasks with Dependencies on Asymmetric Multiprocessors

In this work we study the problem of scheduling tasks with dependencies in multiprocessor architectures where processors have different speeds. We present the preemptive algorithm "Save-Energy" that given a schedule of tasks it post…

Distributed, Parallel, and Cluster Computing · Computer Science 2008-06-09 Ioannis Chatzigiannakis , Georgios Giannoulis , Paul G. Spirakis

Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models

Large language models (LLMs) achieve state-of-the-art accuracy on complex reasoning tasks by generating multiple chain-of-thought (CoT) traces, but using a fixed token budget per query leads to over-computation on easy inputs and…

Artificial Intelligence · Computer Science 2026-02-03 Katrina Brown , Aneesh Muppidi , Rana Shahout

Timely-Throughput Optimal Scheduling with Prediction

Motivated by the increasing importance of providing delay-guaranteed services in general computing and communication systems, and the recent wide adoption of learning and prediction in network control, in this work, we consider a general…

Networking and Internet Architecture · Computer Science 2018-01-08 Kun Chen , Longbo Huang