Related papers: Bring Your Own Codegen to Deep Learning Compiler

Compiler Toolchains for Deep Learning Workloads on Embedded Platforms

As the usage of deep learning becomes increasingly popular in mobile and embedded solutions, it is necessary to convert the framework-specific network representations into executable code for these embedded platforms. This paper consists of…

Programming Languages · Computer Science 2021-04-13 Max Sponner , Bernd Waschneck , Akash Kumar

Software-defined Design Space Exploration for an Efficient DNN Accelerator Architecture

Deep neural networks (DNNs) have been shown to outperform conventional machine learning algorithms across a wide range of applications, e.g., image recognition, object detection, robotics, and natural language processing. However, the high…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-23 Ye Yu , Yingmin Li , Shuai Che , Niraj K. Jha , Weifeng Zhang

A Framework to Enable Algorithmic Design Choice Exploration in DNNs

Deep learning technologies, particularly deep neural networks (DNNs), have demonstrated significant success across many domains. This success has been accompanied by substantial advancements and innovations in the algorithms behind the…

Machine Learning · Computer Science 2025-04-14 Timothy L. Cronin , Sanmukh Kuppannagari

Restoring the Broken Covenant Between Compilers and Deep Learning Accelerators

Deep learning accelerators address the computational demands of Deep Neural Networks (DNNs), departing from the traditional Von Neumann execution model. They leverage specialized hardware to align with the application domain's structure.…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-30 Sean Kinzer , Soroush Ghodrati , Rohan Mahapatra , Byung Hoon Ahn , Edwin Mascarenhas , Xiaolong Li , Janarbek Matai , Liang Zhang , Hadi Esmaeilzadeh

A Metaprogramming and Autotuning Framework for Deploying Deep Learning Applications

In recent years, deep neural networks (DNNs), have yielded strong results on a wide range of applications. Graphics Processing Units (GPUs) have been one key enabling factor leading to the current popularity of DNNs. However, despite…

Neural and Evolutionary Computing · Computer Science 2016-11-22 Matthew W. Moskewicz , Ali Jannesari , Kurt Keutzer

Learning on Hardware: A Tutorial on Neural Network Accelerators and Co-Processors

Deep neural networks (DNNs) have the advantage that they can take into account a large number of parameters, which enables them to solve complex tasks. In computer vision and speech recognition, they have a better accuracy than common…

Machine Learning · Computer Science 2021-04-20 Lukas Baischer , Matthias Wess , Nima TaheriNejad

How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures

The unprecedented performance of deep neural networks (DNNs) has led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition. Nevertheless, deploying such AI models across commodity…

Machine Learning · Computer Science 2021-06-30 Stylianos I. Venieris , Ioannis Panopoulos , Ilias Leontiadis , Iakovos S. Venieris

A Multi-level Compiler Backend for Accelerated Micro-kernels Targeting RISC-V ISA Extensions

High-performance micro-kernels must fully exploit today's diverse and specialized hardware to deliver peak performance to DNNs. While higher-level optimizations for DNNs are offered by numerous compilers (e.g., MLIR, TVM, OpenXLA),…

Programming Languages · Computer Science 2025-02-07 Alexandre Lopoukhine , Federico Ficarelli , Christos Vasiladiotis , Anton Lydike , Josse Van Delm , Alban Dutilleul , Luca Benini , Marian Verhelst , Tobias Grosser

DLFusion: An Auto-Tuning Compiler for Layer Fusion on Deep Neural Network Accelerator

Many hardware vendors have introduced specialized deep neural networks (DNN) accelerators owing to their superior performance and efficiency. As such, how to generate and optimize the code for the hardware accelerator becomes an important…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-12 Zihan Liu , Jingwen Leng , Quan Chen , Chao Li , Wenli Zheng , Li Li , Minyi Guo

Compilation and Optimizations for Efficient Machine Learning on Embedded Systems

Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc. However,…

Machine Learning · Computer Science 2022-08-29 Xiaofan Zhang , Yao Chen , Cong Hao , Sitao Huang , Yuhong Li , Deming Chen

Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance…

Machine Learning · Computer Science 2019-08-02 Seiya Tokui , Ryosuke Okuta , Takuya Akiba , Yusuke Niitani , Toru Ogawa , Shunta Saito , Shuji Suzuki , Kota Uenishi , Brian Vogel , Hiroyuki Yamazaki Vincent

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus…

Machine Learning · Computer Science 2023-07-12 Zixuan Ma , Haojie Wang , Jingze Xing , Liyan Zheng , Chen Zhang , Huanqi Cao , Kezhao Huang , Shizhi Tang , Penghan Wang , Jidong Zhai

Intel nGraph: An Intermediate Representation, Compiler, and Executor for Deep Learning

The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-31 Scott Cyphers , Arjun K. Bansal , Anahita Bhiwandiwalla , Jayaram Bobba , Matthew Brookhart , Avijit Chakraborty , Will Constable , Christian Convey , Leona Cook , Omar Kanawi , Robert Kimball , Jason Knight , Nikolay Korovaiko , Varun Kumar , Yixing Lao , Christopher R. Lishka , Jaikrishnan Menon , Jennifer Myers , Sandeep Aswath Narayana , Adam Procter , Tristan J. Webb

ONNX-to-Hardware Design Flow for the Generation of Adaptive Neural-Network Accelerators on FPGAs

Neural Networks (NN) provide a solid and reliable way of executing different types of applications, ranging from speech recognition to medical diagnosis, speeding up onerous and long workloads. The challenges involved in their…

Hardware Architecture · Computer Science 2023-09-26 Federico Manca , Francesco Ratto

Designing Interpretable Approximations to Deep Reinforcement Learning

In an ever expanding set of research and application areas, deep neural networks (DNNs) set the bar for algorithm performance. However, depending upon additional constraints such as processing power and execution time limits, or…

Machine Learning · Computer Science 2021-06-22 Nathan Dahlin , Krishna Chaitanya Kalagarla , Nikhil Naik , Rahul Jain , Pierluigi Nuzzo

Field-Programmable Deep Neural Network (DNN) Learning and Inference accelerator: a concept

An accelerator is a specialized integrated circuit designed to perform specific computations faster than if those were performed by CPU or GPU. A Field-Programmable DNN learning and inference accelerator (FProg-DNN) using hybrid systolic…

Machine Learning · Computer Science 2018-03-26 Luiz M Franca-Neto

HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation

To speedup Deep Neural Networks (DNN) accelerator design and enable effective implementation, we propose HybridDNN, a framework for building high-performance hybrid DNN accelerators and delivering FPGA-based hardware implementations. Novel…

Hardware Architecture · Computer Science 2020-04-09 Hanchen Ye , Xiaofan Zhang , Zhize Huang , Gengsheng Chen , Deming Chen

E3NE: An End-to-End Framework for Accelerating Spiking Neural Networks with Emerging Neural Encoding on FPGAs

Compiler frameworks are crucial for the widespread use of FPGA-based deep learning accelerators. They allow researchers and developers, who are not familiar with hardware engineering, to harness the performance attained by domain-specific…

Neural and Evolutionary Computing · Computer Science 2022-06-07 Daniel Gerlinghoff , Zhehui Wang , Xiaozhe Gu , Rick Siow Mong Goh , Tao Luo

Computational complexity reduction of deep neural networks

Deep neural networks (DNN) have been widely used and play a major role in the field of computer vision and autonomous navigation. However, these DNNs are computationally complex and their deployment over resource-constrained platforms is…

Machine Learning · Computer Science 2022-08-01 Mee Seong Im , Venkat R. Dasari

oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

With the rapid development of deep learning models and hardware support for dense computing, the deep learning workload characteristics changed significantly from a few hot spots on compute-intensive operations to a broad range of…

Machine Learning · Computer Science 2024-03-12 Jianhui Li , Zhennan Qin , Yijie Mei , Jingze Cui , Yunfei Song , Ciyong Chen , Yifei Zhang , Longsheng Du , Xianhang Cheng , Baihui Jin , Yan Zhang , Jason Ye , Eric Lin , Dan Lavery