English
Related papers

Related papers: Towards High Performance Java-based Deep Learning …

200 papers

This paper describes our experiences creating Tornado: a practical and efficient heterogeneous programming framework for managed languages. The novel aspect of Tornado is that it turns the programming of heterogeneous systems from an…

Programming Languages · Computer Science 2018-03-02 James Clarkson , Christos Kotselidis

Ray tracing has been typically known as a graphics rendering method capable of producing highly realistic imagery and visual effects generated by computers. More recently the performance improvements in Graphics Processing Units (GPUs) have…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-15 Vinh Pham Van , Juan Fumero , Athanasios Stratikopoulos , Florin Blanaru , Christos Kotselidis

In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate…

Programming Languages · Computer Science 2020-11-02 Michail Papadimitriou , Juan Fumero , Athanasios Stratikopoulos , Foivos S. Zakkak , Christos Kotselidis

Given their increasing size and complexity, the need for efficient execution of deep neural networks has become increasingly pressing in the design of heterogeneous High-Performance Computing (HPC) and edge platforms, leading to a wide…

Deep learning methods have resulted in significant performance improvements in several application domains and as such several software frameworks have been developed to facilitate their implementation. This paper presents a comparative…

Machine Learning · Computer Science 2016-03-31 Soheil Bahrampour , Naveen Ramakrishnan , Lukas Schott , Mohak Shah

Recently, several JavaScript-based deep learning frameworks have emerged, making it possible to perform deep learning tasks directly in browsers. However, little is known on what and how well we can do with these frameworks for deep…

Software Engineering · Computer Science 2019-03-26 Yun Ma , Dongwei Xiang , Shuyu Zheng , Deyu Tian , Xuanzhe Liu

Deep learning systems are optimized for clusters with homogeneous resources. However, heterogeneity is prevalent in computing infrastructure across edge, cloud and HPC. When training neural networks using stochastic gradient descent…

Machine Learning · Computer Science 2025-03-25 Sahil Tyagi , Prateek Sharma

Hardware heterogeneity is here to stay for high-performance computing. Large-scale systems are currently equipped with multiple GPU accelerators per compute node and are expected to incorporate more specialized hardware in the future. This…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-05 Polykarpos Thomadakis , Nikos Chrisochoides

Hardware accelerations of deep learning systems have been extensively investigated in industry and academia. The aim of this paper is to achieve ultra-high energy efficiency and performance for hardware implementations of deep neural…

Machine Learning · Computer Science 2018-02-20 Yanzhi Wang , Caiwen Ding , Zhe Li , Geng Yuan , Siyu Liao , Xiaolong Ma , Bo Yuan , Xuehai Qian , Jian Tang , Qinru Qiu , Xue Lin

Specialized accelerators such as GPUs, TPUs, FPGAs, and custom ASICs have been increasingly deployed to train deep learning models. These accelerators exhibit heterogeneous performance behavior across model architectures. Existing…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-24 Deepak Narayanan , Keshav Santhanam , Fiodar Kazhamiaka , Amar Phanishayee , Matei Zaharia

We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems. Our scheme allows to efficiently employ compute accelerators such as GPUs and FPGAs for the training of…

Machine Learning · Computer Science 2017-11-08 Celestine Dünner , Thomas Parnell , Martin Jaggi

In recent years, deep learning has become more and more mature, and as a commonly used algorithm in deep learning, convolutional neural networks have been widely used in various visual tasks. In the past, research based on deep learning…

Artificial Intelligence · Computer Science 2020-12-24 Simin Liu

Recursive neural networks have widely been used by researchers to handle applications with recursively or hierarchically structured data. However, embedded control flow deep learning frameworks such as TensorFlow, Theano, Caffe2, and MXNet…

Machine Learning · Computer Science 2018-09-05 Eunji Jeong , Joo Seong Jeong , Soojeong Kim , Gyeong-In Yu , Byung-Gon Chun

Deep learning is increasingly attracting attention for processing big data. Existing frameworks for deep learning must be set up to specialized computer systems. Gaining sufficient computing resources therefore entails high costs of…

Computer Vision and Pattern Recognition · Computer Science 2017-03-28 Masatoshi Hidaka , Ken Miura , Tatsuya Harada

Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding,…

Distributed training frameworks, like TensorFlow, have been proposed as a means to reduce the training time of deep learning models by using a cluster of GPU servers. While such speedups are often desirable---e.g., for rapidly evaluating…

Performance · Computer Science 2019-05-07 Shijian Li , Robert J. Walls , Lijie Xu , Tian Guo

The pervasive adoption of Deep Learning (DL) and Graph Processing (GP) makes it a de facto requirement to build large-scale clusters of heterogeneous accelerators including GPUs and FPGAs. The OpenCL programming framework can be used on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-19 Yao Chen , Xin Long , Jiong He , Yuhang Chen , Hongshi Tan , Zhenxiang Zhang , Marianne Winslett , Deming Chen

The widely-adopted practice is to train deep learning models with specialized hardware accelerators, e.g., GPUs or TPUs, due to their superior performance on linear algebra operations. However, this strategy does not employ effectively the…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-21 Yujing Ma , Florin Rusu

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Due to recent advances in digital technologies, and availability of credible data, an area of artificial intelligence, deep learning, has emerged, and has demonstrated its ability and effectiveness in solving complex learning problems not…

Neural and Evolutionary Computing · Computer Science 2019-01-03 Ahmad Shawahna , Sadiq M. Sait , Aiman El-Maleh
‹ Prev 1 2 3 10 Next ›