English
Related papers

Related papers: Compiler Toolchains for Deep Learning Workloads on…

200 papers

Deep neural networks (DNNs) have been ubiquitously applied in many applications, and accelerators are emerged as an enabler to support the fast and efficient inference tasks of these applications. However, to achieve high model coverage…

Machine Learning · Computer Science 2021-05-10 Zhi Chen , Cody Hao Yu , Trevor Morris , Jorn Tuyls , Yi-Hsiang Lai , Jared Roesch , Elliott Delaye , Vin Sharma , Yida Wang

Researchers have been highly active to investigate the classical machine learning workflow and integrate best practices from the software engineering lifecycle. However, deep learning exhibits deviations that are not yet covered in this…

Software Engineering · Computer Science 2022-08-30 Janosch Baltensperger , Pasquale Salza , Harald C. Gall

The growing adoption of domain-specific architectures in edge computing platforms for deep learning has highlighted the efficiency of hardware accelerators. However, integrating custom accelerators into modern machine learning (ML)…

Machine Learning · Computer Science 2025-07-08 Samira Ahmadifarsani , Daniel Mueller-Gritschneder , Ulf Schlichtmann

Given their increasing size and complexity, the need for efficient execution of deep neural networks has become increasingly pressing in the design of heterogeneous High-Performance Computing (HPC) and edge platforms, leading to a wide…

From computer vision and speech recognition to forecasting trajectories in autonomous vehicles, deep learning approaches are at the forefront of so many domains. Deep learning models are developed using plethora of high-level, generic…

Machine Learning · Computer Science 2021-05-07 Hamid Tabani , Ajay Balasubramaniam , Elahe Arani , Bahram Zonooz

The difficulty of deploying various deep learning (DL) models on diverse DL hardware has boosted the research and development of DL compilers in the community. Several DL compilers have been proposed from both industry and academia such as…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-26 Mingzhen Li , Yi Liu , Xiaoyan Liu , Qingxiao Sun , Xin You , Hailong Yang , Zhongzhi Luan , Lin Gan , Guangwen Yang , Depei Qian

This work presents a comprehensive evaluation of neural network graph compilers across heterogeneous hardware platforms, addressing the critical gap between theoretical optimization techniques and practical deployment scenarios. We…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-30 Alireza Furutanpey , Carmen Walser , Philipp Raith , Pantelis A. Frangoudis , Schahram Dustdar

With the success of deep learning techniques in a broad range of application domains, many deep learning software frameworks have been developed and are being updated frequently to adapt to new hardware features and software libraries,…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-10 Pengfei Xu , Shaohuai Shi , Xiaowen Chu

There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the…

Machine Learning · Computer Science 2024-05-06 Sicong Liu , Wentao Zhou , Zimu Zhou , Bin Guo , Minfan Wang , Cheng Fang , Zheng Lin , Zhiwen Yu

Deploying deep learning models on embedded devices is an arduous task: oftentimes, there exist no platform-specific instructions, and compilation times can be considerably large due to the limited computational resources available…

Sound · Computer Science 2023-06-21 Teresa Pelinski , Rodrigo Diaz , Adán L. Benito Temprano , Andrew McPherson

A composable infrastructure is defined as resources, such as compute, storage, accelerators and networking, that are shared in a pool and that can be grouped in various configurations to meet application requirements. This freedom to 'mix…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-22 Kauotar El Maghraoui , Lorraine M. Herger , Chekuri Choudary , Kim Tran , Todd Deshane , David Hanson

Deploying Deep Learning (DL) on embedded end devices is a scorching trend in pervasive computing. Since most Microcontrollers on embedded devices have limited computing power, it is necessary to add a DL accelerator. Embedded Field…

Hardware Architecture · Computer Science 2024-09-17 Chao Qian , Tianheng Ling , Gregor Schiele

In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate…

Programming Languages · Computer Science 2020-11-02 Michail Papadimitriou , Juan Fumero , Athanasios Stratikopoulos , Foivos S. Zakkak , Christos Kotselidis

Machine learning applications are increasingly deployed not only to serve predictions using static models, but also as tightly-integrated components of feedback loops involving dynamic, real-time decision making. These applications pose a…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-23 Robert Nishihara , Philipp Moritz , Stephanie Wang , Alexey Tumanov , William Paul , Johann Schleier-Smith , Richard Liaw , Mehrdad Niknami , Michael I. Jordan , Ion Stoica

Any quantum computing application, once encoded as a quantum circuit, must be compiled before being executable on a quantum computer. Similar to classical compilation, quantum compilation is a sequential process with many compilation steps…

Quantum Physics · Physics 2024-06-25 Nils Quetschlich , Lukas Burgholzer , Robert Wille

Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance…

In the last decade, machine learning based compilation has moved from an an obscure research niche to a mainstream activity. In this article, we describe the relationship between machine learning and compiler optimisation and introduce the…

Programming Languages · Computer Science 2018-05-10 Zheng Wang , Michael O'Boyle

The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields.…

Machine Learning · Computer Science 2022-07-04 Daniel Nichols , Siddharth Singh , Shu-Huai Lin , Abhinav Bhatele

Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc. However,…

Machine Learning · Computer Science 2022-08-29 Xiaofan Zhang , Yao Chen , Cong Hao , Sitao Huang , Yuhong Li , Deming Chen

To increase performance and efficiency, systems use FPGAs as reconfigurable accelerators. A key challenge in designing these systems is partitioning computation between processors and an FPGA. An appropriate division of labor may be…

Hardware Architecture · Computer Science 2021-07-21 Endri Bezati , Mahyar Emami , Jörn Janneck , James Larus
‹ Prev 1 2 3 10 Next ›