Related papers: Compiler Toolchains for Deep Learning Workloads on…
Deep neural networks (DNNs) have been ubiquitously applied in many applications, and accelerators are emerged as an enabler to support the fast and efficient inference tasks of these applications. However, to achieve high model coverage…
Researchers have been highly active to investigate the classical machine learning workflow and integrate best practices from the software engineering lifecycle. However, deep learning exhibits deviations that are not yet covered in this…
The growing adoption of domain-specific architectures in edge computing platforms for deep learning has highlighted the efficiency of hardware accelerators. However, integrating custom accelerators into modern machine learning (ML)…
Given their increasing size and complexity, the need for efficient execution of deep neural networks has become increasingly pressing in the design of heterogeneous High-Performance Computing (HPC) and edge platforms, leading to a wide…
From computer vision and speech recognition to forecasting trajectories in autonomous vehicles, deep learning approaches are at the forefront of so many domains. Deep learning models are developed using plethora of high-level, generic…
The difficulty of deploying various deep learning (DL) models on diverse DL hardware has boosted the research and development of DL compilers in the community. Several DL compilers have been proposed from both industry and academia such as…
This work presents a comprehensive evaluation of neural network graph compilers across heterogeneous hardware platforms, addressing the critical gap between theoretical optimization techniques and practical deployment scenarios. We…
With the success of deep learning techniques in a broad range of application domains, many deep learning software frameworks have been developed and are being updated frequently to adapt to new hardware features and software libraries,…
There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the…
Deploying deep learning models on embedded devices is an arduous task: oftentimes, there exist no platform-specific instructions, and compilation times can be considerably large due to the limited computational resources available…
A composable infrastructure is defined as resources, such as compute, storage, accelerators and networking, that are shared in a pool and that can be grouped in various configurations to meet application requirements. This freedom to 'mix…
Deploying Deep Learning (DL) on embedded end devices is a scorching trend in pervasive computing. Since most Microcontrollers on embedded devices have limited computing power, it is necessary to add a DL accelerator. Embedded Field…
In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate…
Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a…
Any quantum computing application, once encoded as a quantum circuit, must be compiled before being executable on a quantum computer. Similar to classical compilation, quantum compilation is a sequential process with many compilation steps…
Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance…
In the last decade, machine learning based compilation has moved from an an obscure research niche to a mainstream activity. In this article, we describe the relationship between machine learning and compiler optimisation and introduce the…
The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields.…
Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc. However,…
To increase performance and efficiency, systems use FPGAs as reconfigurable accelerators. A key challenge in designing these systems is partitioning computation between processors and an FPGA. An appropriate division of labor may be…