Related papers: Hidet: Task-Mapping Programming Paradigm for Deep …

Value Function Based Performance Optimization of Deep Learning Workloads

As machine learning techniques become ubiquitous, the efficiency of neural network implementations is becoming correspondingly paramount. Frameworks, such as Halide and TVM, separate out the algorithmic representation of the network from…

Machine Learning · Computer Science 2020-12-01 Benoit Steiner , Chris Cummins , Horace He , Hugh Leather

Explore as a Storm, Exploit as a Raindrop: On the Benefit of Fine-Tuning Kernel Schedulers with Coordinate Descent

Machine-learning models consist of kernels, which are algorithms applying operations on tensors -- data indexed by a linear combination of natural numbers. Examples of kernels include convolutions, transpositions, and vectorial products.…

Machine Learning · Computer Science 2024-07-16 Michael Canesche , Gaurav Verma , Fernando Magno Quintao Pereira

Fast and Adaptive Task Management in MEC: A Deep Learning Approach Using Pointer Networks

Task offloading and scheduling in Mobile Edge Computing (MEC) are vital for meeting the low-latency demands of modern IoT and dynamic task scheduling scenarios. MEC reduces the processing burden on resource-constrained devices by enabling…

Networking and Internet Architecture · Computer Science 2026-01-23 Arild Yonkeu , Mohammadreza Amini , Burak Kantarci

Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation

Auto-scheduling for tensor programs is a process where a search algorithm automatically explores candidate schedules (program transformations) for a given program on a target hardware platform to improve its performance. However this can be…

Machine Learning · Computer Science 2022-09-08 Perry Gibson , José Cano

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

GCNScheduler: Scheduling Distributed Computing Applications using Graph Convolutional Networks

We consider the classical problem of scheduling task graphs corresponding to complex applications on distributed computing systems. A number of heuristics have been previously proposed to optimize task scheduling with respect to metrics…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-25 Mehrdad Kiamari , Bhaskar Krishnamachari

Scheduling Real-time Deep Learning Services as Imprecise Computations

The paper presents an efficient real-time scheduling algorithm for intelligent real-time edge services, defined as those that perform machine intelligence tasks, such as voice recognition, LIDAR processing, or machine vision, on behalf of…

Machine Learning · Computer Science 2020-11-03 Shuochao Yao , Yifan Hao , Yiran Zhao , Huajie Shao , Dongxin Liu , Shengzhong Liu , Tianshi Wang , Jinyang Li , Tarek Abdelzaher

Scheduling Inference Workloads on Distributed Edge Clusters with Reinforcement Learning

Many real-time applications (e.g., Augmented/Virtual Reality, cognitive assistance) rely on Deep Neural Networks (DNNs) to process inference tasks. Edge computing is considered a key infrastructure to deploy such applications, as moving…

Machine Learning · Computer Science 2023-02-01 Gabriele Castellano , Juan-José Nieto , Jordi Luque , Ferrán Diego , Carlos Segura , Diego Perino , Flavio Esposito , Fulvio Risso , Aravindh Raman

Pushing Tensor Accelerators Beyond MatMul in a User-Schedulable Language

Tensor accelerators now represent a growing share of compute resources in modern CPUs and GPUs. However, they are hard to program, leading developers to use vendor-provided kernel libraries that support tensor accelerators. As a result, the…

Programming Languages · Computer Science 2026-02-12 Yihong Zhang , Derek Gerstmann , Andrew Adams , Maaz Bin Safeer Ahmad

Ansor: Generating High-Performance Tensor Programs for Deep Learning

High-performance tensor programs are crucial to guarantee efficient execution of deep neural networks. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously challenging.…

Machine Learning · Computer Science 2023-10-17 Lianmin Zheng , Chengfan Jia , Minmin Sun , Zhao Wu , Cody Hao Yu , Ameer Haj-Ali , Yida Wang , Jun Yang , Danyang Zhuo , Koushik Sen , Joseph E. Gonzalez , Ion Stoica

Resource Heterogeneity-Aware and Utilization-Enhanced Scheduling for Deep Learning Clusters

Scheduling deep learning (DL) models to train on powerful clusters with accelerators like GPUs and TPUs, presently falls short, either lacking fine-grained heterogeneity awareness or leaving resources substantially under-utilized. To fill…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-17 Abeda Sultana , Nabin Pakka , Fei Xu , Xu Yuan , Li Chen , Nian-Feng Tzeng

TLP: A Deep Learning-based Cost Model for Tensor Program Tuning

Tensor program tuning is a non-convex objective optimization problem, to which search-based approaches have proven to be effective. At the core of the search-based approaches lies the design of the cost model. Though deep learning-based…

Machine Learning · Computer Science 2022-11-23 Yi Zhai , Yu Zhang , Shuo Liu , Xiaomeng Chu , Jie Peng , Jianmin Ji , Yanyong Zhang

DeepSoCS: A Neural Scheduler for Heterogeneous System-on-Chip (SoC) Resource Scheduling

In this paper, we~present a novel scheduling solution for a class of System-on-Chip (SoC) systems where heterogeneous chip resources (DSP, FPGA, GPU, etc.) must be efficiently scheduled for continuously arriving hierarchical jobs with their…

Artificial Intelligence · Computer Science 2020-06-08 Tegg Taekyong Sung , Jeongsoo Ha , Jeewoo Kim , Alex Yahja , Chae-Bong Sohn , Bo Ryu

Machine Learning for Scheduling: A Paradigm Shift from Solver-Centric to Data-Centric Approaches

Scheduling problems are a fundamental class of combinatorial optimization problems that underpin operational efficiency in manufacturing, logistics, and service systems. While operations research has traditionally developed solver-centric…

Optimization and Control · Mathematics 2026-02-03 Anbang Liu , Shaochong Lin , Jingchuan Chen , Peng Wu , Zuojun Max Shen

Learning to Schedule: A Supervised Learning Framework for Network-Aware Scheduling of Data-Intensive Workloads

Distributed cloud environments hosting data-intensive applications often experience slowdowns due to network congestion, asymmetric bandwidth, and inter-node data shuffling. These factors are typically not captured by traditional host-level…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-21 Sankalpa Timilsina , Susmit Shannigrahi

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new…

Machine Learning · Computer Science 2018-10-09 Tianqi Chen , Thierry Moreau , Ziheng Jiang , Lianmin Zheng , Eddie Yan , Meghan Cowan , Haichen Shen , Leyuan Wang , Yuwei Hu , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Transfer Learning Across Heterogeneous Features For Efficient Tensor Program Generation

Tuning tensor program generation involves searching for various possible program transformation combinations for a given program on target hardware to optimize the tensor program execution. It is already a complex process because of the…

Programming Languages · Computer Science 2023-12-29 Gaurav Verma , Siddhisanket Raskar , Zhen Xie , Abid M Malik , Murali Emani , Barbara Chapman

Pruner: A Draft-then-Verify Exploration Mechanism to Accelerate Tensor Program Tuning

Tensor program tuning is essential for the efficient deployment of deep neural networks. Search-based approaches have demonstrated scalability and effectiveness in automatically finding high-performance programs for specific hardware.…

Machine Learning · Computer Science 2025-04-10 Liang Qiao , Jun Shi , Xiaoyu Hao , Xi Fang , Sen Zhang , Minfan Zhao , Ziqi Zhu , Junshi Chen , Hong An , Xulong Tang , Bing Li , Honghui Yuan , Xinyang Wang

TensorSocket: Shared Data Loading for Deep Learning Training

Training deep learning models is a repetitive and resource-intensive process. Data scientists often train several models before landing on a set of parameters (e.g., hyper-parameter tuning) and model architecture (e.g., neural architecture…

Machine Learning · Computer Science 2025-08-04 Ties Robroek , Neil Kim Nielsen , Pınar Tözün

Ensuring Data Freshness in Multi-Rate Task Chains Scheduling

In safety-critical autonomous systems, data freshness presents a fundamental design challenge. While the Logical Execution Time (LET) paradigm ensures compositional determinism, it often does so at the cost of injected latency, degrading…

Operating Systems · Computer Science 2026-03-11 José Luis Conradi Hoffmann , Antônio Augusto Fröhlich