Related papers: Towards High Performance Java-based Deep Learning …

Tornado: A Practical And Efficient Heterogeneous Programming Framework For Managed Languages

This paper describes our experiences creating Tornado: a practical and efficient heterogeneous programming framework for managed languages. The novel aspect of Tornado is that it turns the programming of heterogeneous systems from an…

Programming Languages · Computer Science 2018-03-02 James Clarkson , Christos Kotselidis

Accelerating Java Ray Tracing Applications on Heterogeneous Hardware

Ray tracing has been typically known as a graphics rendering method capable of producing highly realistic imagery and visual effects generated by computers. More recently the performance improvements in Graphics Processing Units (GPUs) have…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-15 Vinh Pham Van , Juan Fumero , Athanasios Stratikopoulos , Florin Blanaru , Christos Kotselidis

Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs

In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate…

Programming Languages · Computer Science 2020-11-02 Michail Papadimitriou , Juan Fumero , Athanasios Stratikopoulos , Foivos S. Zakkak , Christos Kotselidis

A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

Given their increasing size and complexity, the need for efficient execution of deep neural networks has become increasingly pressing in the design of heterogeneous High-Performance Computing (HPC) and edge platforms, leading to a wide…

Hardware Architecture · Computer Science 2025-05-23 Serena Curzel , Fabrizio Ferrandi , Leandro Fiorin , Daniele Ielmini , Cristina Silvano , Francesco Conti , Luca Bompani , Luca Benini , Enrico Calore , Sebastiano Fabio Schifano , Cristian Zambelli , Maurizio Palesi , Giuseppe Ascia , Enrico Russo , Valeria Cardellini , Salvatore Filippone , Francesco Lo Presti , Stefania Perri

Comparative Study of Deep Learning Software Frameworks

Deep learning methods have resulted in significant performance improvements in several application domains and as such several software frameworks have been developed to facilitate their implementation. This paper presents a comparative…

Machine Learning · Computer Science 2016-03-31 Soheil Bahrampour , Naveen Ramakrishnan , Lukas Schott , Mohak Shah

Moving Deep Learning into Web Browser: How Far Can We Go?

Recently, several JavaScript-based deep learning frameworks have emerged, making it possible to perform deep learning tasks directly in browsers. However, little is known on what and how well we can do with these frameworks for deep…

Software Engineering · Computer Science 2019-03-26 Yun Ma , Dongwei Xiang , Shuyu Zheng , Deyu Tian , Xuanzhe Liu

OmniLearn: A Framework for Distributed Deep Learning over Heterogeneous Clusters

Deep learning systems are optimized for clusters with homogeneous resources. However, heterogeneity is prevalent in computing infrastructure across edge, cloud and HPC. When training neural networks using stochastic gradient descent…

Machine Learning · Computer Science 2025-03-25 Sahil Tyagi , Prateek Sharma

Towards Performance Portable Programming for Distributed Heterogeneous Systems

Hardware heterogeneity is here to stay for high-performance computing. Large-scale systems are currently equipped with multiple GPU accelerators per compute node and are expected to incorporate more specialized hardware in the future. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-05 Polykarpos Thomadakis , Nikos Chrisochoides

Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework

Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural…

Machine Learning · Computer Science 2018-02-20 Yanzhi Wang , Caiwen Ding , Zhe Li , Geng Yuan , Siyu Liao , Xiaolong Ma , Bo Yuan , Xuehai Qian , Jian Tang , Qinru Qiu , Xue Lin

Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads

Specialized accelerators such as GPUs, TPUs, FPGAs, and custom ASICs have been increasingly deployed to train deep learning models. These accelerators exhibit heterogeneous performance behavior across model architectures. Existing…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-24 Deepak Narayanan , Keshav Santhanam , Fiodar Kazhamiaka , Amar Phanishayee , Matei Zaharia

Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems. Our scheme allows to efficiently employ compute accelerators such as GPUs and FPGAs for the training of…

Machine Learning · Computer Science 2017-11-08 Celestine Dünner , Thomas Parnell , Martin Jaggi

Overview of FPGA deep learning acceleration based on convolutional neural network

In recent years, deep learning has become more and more mature, and as a commonly used algorithm in deep learning, convolutional neural networks have been widely used in various visual tasks. In the past, research based on deep learning…

Artificial Intelligence · Computer Science 2020-12-24 Simin Liu

Improving the Expressiveness of Deep Learning Frameworks with Recursion

Recursive neural networks have widely been used by researchers to handle applications with recursively or hierarchically structured data. However, embedded control flow deep learning frameworks such as TensorFlow, Theano, Caffe2, and MXNet…

Machine Learning · Computer Science 2018-09-05 Eunji Jeong , Joo Seong Jeong , Soojeong Kim , Gyeong-In Yu , Byung-Gon Chun

Development of JavaScript-based deep learning platform and application to distributed training

Deep learning is increasingly attracting attention for processing big data. Existing frameworks for deep learning must be set up to specialized computer systems. Gaining sufficient computing resources therefore entails high costs of…

Computer Vision and Pattern Recognition · Computer Science 2017-03-28 Masatoshi Hidaka , Ken Miura , Tatsuya Harada

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding,…

Programming Languages · Computer Science 2018-07-02 Nicolas Vasilache , Oleksandr Zinenko , Theodoros Theodoridis , Priya Goyal , Zachary DeVito , William S. Moses , Sven Verdoolaege , Andrew Adams , Albert Cohen

Speeding up Deep Learning with Transient Servers

Distributed training frameworks, like TensorFlow, have been proposed as a means to reduce the training time of deep learning models by using a cluster of GPU servers. While such speedups are often desirable---e.g., for rapidly evaluating…

Performance · Computer Science 2019-05-07 Shijian Li , Robert J. Walls , Lijie Xu , Tian Guo

HaoCL: Harnessing Large-scale Heterogeneous Processors Made Easy

The pervasive adoption of Deep Learning (DL) and Graph Processing (GP) makes it a de facto requirement to build large-scale clusters of heterogeneous accelerators including GPUs and FPGAs. The OpenCL programming framework can be used on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-19 Yao Chen , Xin Long , Jiong He , Yuhang Chen , Hongshi Tan , Zhenxiang Zhang , Marianne Winslett , Deming Chen

Heterogeneous CPU+GPU Stochastic Gradient Descent Algorithms

The widely-adopted practice is to train deep learning models with specialized hardware accelerators, e.g., GPUs or TPUs, due to their superior performance on linear algebra operations. However, this strategy does not employ effectively the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-21 Yujing Ma , Florin Rusu

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review

Due to recent advances in digital technologies, and availability of credible data, an area of artificial intelligence, deep learning, has emerged, and has demonstrated its ability and effectiveness in solving complex learning problems not…

Neural and Evolutionary Computing · Computer Science 2019-01-03 Ahmad Shawahna , Sadiq M. Sait , Aiman El-Maleh