Related papers: Compiler Toolchains for Deep Learning Workloads on…

Bring Your Own Codegen to Deep Learning Compiler

Deep neural networks (DNNs) have been ubiquitously applied in many applications, and accelerators are emerged as an enabler to support the fast and efficient inference tasks of these applications. However, to achieve high model coverage…

Machine Learning · Computer Science 2021-05-10 Zhi Chen , Cody Hao Yu , Trevor Morris , Jorn Tuyls , Yi-Hsiang Lai , Jared Roesch , Elliott Delaye , Vin Sharma , Yida Wang

Continuous Deep Learning: A Workflow to Bring Models into Production

Researchers have been highly active to investigate the classical machine learning workflow and integrate best practices from the software engineering lifecycle. However, deep learning exhibits deviations that are not yet covered in this…

Software Engineering · Computer Science 2022-08-30 Janosch Baltensperger , Pasquale Salza , Harald C. Gall

A High-Level Compiler Integration Approach for Deep Learning Accelerators Supporting Abstraction and Optimization

The growing adoption of domain-specific architectures in edge computing platforms for deep learning has highlighted the efficiency of hardware accelerators. However, integrating custom accelerators into modern machine learning (ML)…

Machine Learning · Computer Science 2025-07-08 Samira Ahmadifarsani , Daniel Mueller-Gritschneder , Ulf Schlichtmann

A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

Given their increasing size and complexity, the need for efficient execution of deep neural networks has become increasingly pressing in the design of heterogeneous High-Performance Computing (HPC) and edge platforms, leading to a wide…

Hardware Architecture · Computer Science 2025-05-23 Serena Curzel , Fabrizio Ferrandi , Leandro Fiorin , Daniele Ielmini , Cristina Silvano , Francesco Conti , Luca Bompani , Luca Benini , Enrico Calore , Sebastiano Fabio Schifano , Cristian Zambelli , Maurizio Palesi , Giuseppe Ascia , Enrico Russo , Valeria Cardellini , Salvatore Filippone , Francesco Lo Presti , Stefania Perri

Challenges and Obstacles Towards Deploying Deep Learning Models on Mobile Devices

From computer vision and speech recognition to forecasting trajectories in autonomous vehicles, deep learning approaches are at the forefront of so many domains. Deep learning models are developed using plethora of high-level, generic…

Machine Learning · Computer Science 2021-05-07 Hamid Tabani , Ajay Balasubramaniam , Elahe Arani , Bahram Zonooz

The Deep Learning Compiler: A Comprehensive Survey

The difficulty of deploying various deep learning (DL) models on diverse DL hardware has boosted the research and development of DL compilers in the community. Several DL compilers have been proposed from both industry and academia such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-26 Mingzhen Li , Yi Liu , Xiaoyan Liu , Qingxiao Sun , Xin You , Hailong Yang , Zhongzhi Luan , Lin Gan , Guangwen Yang , Depei Qian

Leveraging Neural Graph Compilers in Machine Learning Research for Edge-Cloud Systems

This work presents a comprehensive evaluation of neural network graph compilers across heterogeneous hardware platforms, addressing the critical gap between theoretical optimization techniques and practical deployment scenarios. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-30 Alireza Furutanpey , Carmen Walser , Philipp Raith , Pantelis A. Frangoudis , Schahram Dustdar

Performance Evaluation of Deep Learning Tools in Docker Containers

With the success of deep learning techniques in a broad range of application domains, many deep learning software frameworks have been developed and are being updated frequently to adapt to new hardware features and software libraries,…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-10 Pengfei Xu , Shaohuai Shi , Xiaowen Chu

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the…

Machine Learning · Computer Science 2024-05-06 Sicong Liu , Wentao Zhou , Zimu Zhou , Bin Guo , Minfan Wang , Cheng Fang , Zheng Lin , Zhiwen Yu

Pipeline for recording datasets and running neural networks on the Bela embedded hardware platform

Deploying deep learning models on embedded devices is an arduous task: oftentimes, there exist no platform-specific instructions, and compilation times can be considerably large due to the limited computational resources available…

Sound · Computer Science 2023-06-21 Teresa Pelinski , Rodrigo Diaz , Adán L. Benito Temprano , Andrew McPherson

Performance Analysis of Deep Learning Workloads on a Composable System

A composable infrastructure is defined as resources, such as compute, storage, accelerators and networking, that are shared in a pool and that can be grouped in various configurations to meet application requirements. This freedom to 'mix…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-22 Kauotar El Maghraoui , Lorraine M. Herger , Chekuri Choudary , Kim Tran , Todd Deshane , David Hanson

ElasticAI: Creating and Deploying Energy-Efficient Deep Learning Accelerator for Pervasive Computing

Deploying Deep Learning (DL) on embedded end devices is a scorching trend in pervasive computing. Since most Microcontrollers on embedded devices have limited computing power, it is necessary to add a DL accelerator. Embedded Field…

Hardware Architecture · Computer Science 2024-09-17 Chao Qian , Tianheng Ling , Gregor Schiele

Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs

In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate…

Programming Languages · Computer Science 2020-11-02 Michail Papadimitriou , Juan Fumero , Athanasios Stratikopoulos , Foivos S. Zakkak , Christos Kotselidis

Real-Time Machine Learning: The Missing Pieces

Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-23 Robert Nishihara , Philipp Moritz , Stephanie Wang , Alexey Tumanov , William Paul , Johann Schleier-Smith , Richard Liaw , Mehrdad Niknami , Michael I. Jordan , Ion Stoica

Compiler Optimization for Quantum Computing Using Reinforcement Learning

Any quantum computing application, once encoded as a quantum circuit, must be compiled before being executable on a quantum computer. Similar to classical compilation, quantum compilation is a sequential process with many compilation steps…

Quantum Physics · Physics 2024-06-25 Nils Quetschlich , Lukas Burgholzer , Robert Wille

Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance…

Machine Learning · Computer Science 2019-08-02 Seiya Tokui , Ryosuke Okuta , Takuya Akiba , Yusuke Niitani , Toru Ogawa , Shunta Saito , Shuji Suzuki , Kota Uenishi , Brian Vogel , Hiroyuki Yamazaki Vincent

Machine Learning in Compiler Optimisation

In the last decade, machine learning based compilation has moved from an an obscure research niche to a mainstream activity. In this article, we describe the relationship between machine learning and compiler optimisation and introduce the…

Programming Languages · Computer Science 2018-05-10 Zheng Wang , Michael O'Boyle

A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks

The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields.…

Machine Learning · Computer Science 2022-07-04 Daniel Nichols , Siddharth Singh , Shu-Huai Lin , Abhinav Bhatele

Compilation and Optimizations for Efficient Machine Learning on Embedded Systems

Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc. However,…

Machine Learning · Computer Science 2022-08-29 Xiaofan Zhang , Yao Chen , Cong Hao , Sitao Huang , Yuhong Li , Deming Chen

StreamBlocks: A compiler for heterogeneous dataflow computing (technical report)

To increase performance and efficiency, systems use FPGAs as reconfigurable accelerators. A key challenge in designing these systems is partitioning computation between processors and an FPGA. An appropriate division of labor may be…

Hardware Architecture · Computer Science 2021-07-21 Endri Bezati , Mahyar Emami , Jörn Janneck , James Larus