Related papers: Dragon-Alpha&cu32: A Java-based Tensor Computing F…

Towards High Performance Java-based Deep Learning Frameworks

The advent of modern cloud services along with the huge volume of data produced on a daily basis, have set the demand for fast and efficient data processing. This demand is common among numerous application domains, such as deep learning,…

Machine Learning · Computer Science 2020-01-14 Athanasios Stratikopoulos , Juan Fumero , Zoran Sevarac , Christos Kotselidis

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Tensor Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions

Deep learning models with convolutional and recurrent networks are now ubiquitous and analyze massive amounts of audio, image, video, text and graph data, with applications in automatic translation, speech-to-text, scene understanding,…

Programming Languages · Computer Science 2018-07-02 Nicolas Vasilache , Oleksandr Zinenko , Theodoros Theodoridis , Priya Goyal , Zachary DeVito , William S. Moses , Sven Verdoolaege , Andrew Adams , Albert Cohen

Comparative Study of Deep Learning Software Frameworks

Deep learning methods have resulted in significant performance improvements in several application domains and as such several software frameworks have been developed to facilitate their implementation. This paper presents a comparative…

Machine Learning · Computer Science 2016-03-31 Soheil Bahrampour , Naveen Ramakrishnan , Lukas Schott , Mohak Shah

Podracer architectures for scalable Reinforcement Learning

Supporting state-of-the-art AI research requires balancing rapid prototyping, ease of use, and quick iteration, with the ability to deploy experiments at a scale traditionally associated with production systems.Deep learning frameworks such…

Machine Learning · Computer Science 2021-04-14 Matteo Hessel , Manuel Kroiss , Aidan Clark , Iurii Kemaev , John Quan , Thomas Keck , Fabio Viola , Hado van Hasselt

Dragonfly: a modular deep reinforcement learning library

Dragonfly is a deep reinforcement learning library focused on modularity, in order to ease experimentation and developments. It relies on a json serialization that allows to swap building blocks and perform parameter sweep, while minimizing…

Machine Learning · Computer Science 2025-07-29 Jonathan Viquerat , Paul Garnier , Amirhossein Bateni , Elie Hachem

Democratizing AI: A Comparative Study in Deep Learning Efficiency and Future Trends in Computational Processing

The exponential growth in data has intensified the demand for computational power to train large-scale deep learning models. However, the rapid growth in model size and complexity raises concerns about equal and fair access to computational…

Performance · Computer Science 2026-04-03 Lisan Al Amin , Md Ismail Hossain , Rupak Kumar Das , Mahbubul Islam , Abdulaziz Tabbakh

Moving Deep Learning into Web Browser: How Far Can We Go?

Recently, several JavaScript-based deep learning frameworks have emerged, making it possible to perform deep learning tasks directly in browsers. However, little is known on what and how well we can do with these frameworks for deep…

Software Engineering · Computer Science 2019-03-26 Yun Ma , Dongwei Xiang , Shuyu Zheng , Deyu Tian , Xuanzhe Liu

DIVA-DAF: A Deep Learning Framework for Historical Document Image Analysis

Deep learning methods have shown strong performance in solving tasks for historical document image analysis. However, despite current libraries and frameworks, programming an experiment or a set of experiments and executing them can be…

Computer Vision and Pattern Recognition · Computer Science 2024-02-16 Lars Vögtlin , Anna Scius-Bertrand , Paul Maergner , Andreas Fischer , Rolf Ingold

CosmoFlow: Using Deep Learning to Learn the Universe at Scale

Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of…

Cosmology and Nongalactic Astrophysics · Physics 2018-11-13 Amrita Mathuriya , Deborah Bard , Peter Mendygral , Lawrence Meadows , James Arnemann , Lei Shao , Siyu He , Tuomas Karna , Daina Moise , Simon J. Pennycook , Kristyn Maschoff , Jason Sewall , Nalini Kumar , Shirley Ho , Mike Ringenburg , Prabhat , Victor Lee

Benchmarking State-of-the-Art Deep Learning Software Tools

Deep learning has been shown as a successful machine learning method for a variety of tasks, and its popularity results in numerous open-source deep learning software tools. Training a deep network is usually a very time-consuming process.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-20 Shaohuai Shi , Qiang Wang , Pengfei Xu , Xiaowen Chu

BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism

Neural network frameworks such as PyTorch and TensorFlow are the workhorses of numerous machine learning applications ranging from object recognition to machine translation. While these frameworks are versatile and straightforward to use,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-24 Nicolas Weber , Florian Schmidt , Mathias Niepert , Felipe Huici

Deep API Learning Revisited

Understanding the correct API usage sequences is one of the most important tasks for programmers when they work with unfamiliar libraries. However, programmers often encounter obstacles to finding the appropriate information due to either…

Software Engineering · Computer Science 2022-05-04 James Martin , Jin L. C. Guo

Speeding up Deep Learning with Transient Servers

Distributed training frameworks, like TensorFlow, have been proposed as a means to reduce the training time of deep learning models by using a cluster of GPU servers. While such speedups are often desirable---e.g., for rapidly evaluating…

Performance · Computer Science 2019-05-07 Shijian Li , Robert J. Walls , Lijie Xu , Tian Guo

Transparent FPGA Acceleration with TensorFlow

Today, artificial neural networks are one of the major innovators pushing the progress of machine learning. This has particularly affected the development of neural network accelerating hardware. However, since most of these architectures…

Hardware Architecture · Computer Science 2021-02-12 Simon Pfenning , Philipp Holzinger , Marc Reichenbach

TensorLayer: A Versatile Library for Efficient Deep Learning Development

Deep learning has enabled major advances in the fields of computer vision, natural language processing, and multimedia among many others. Developing a deep learning system is arduous and complex, as it involves constructing neural network…

Machine Learning · Computer Science 2017-08-04 Hao Dong , Akara Supratak , Luo Mai , Fangde Liu , Axel Oehmichen , Simiao Yu , Yike Guo

Deep Learning and Machine Learning with GPGPU and CUDA: Unlocking the Power of Parallel Computing

General Purpose Graphics Processing Unit (GPGPU) computing plays a transformative role in deep learning and machine learning by leveraging the computational advantages of parallel processing. Through the power of Compute Unified Device…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-20 Ming Li , Ziqian Bi , Tianyang Wang , Yizhu Wen , Qian Niu , Xinyuan Song , Zekun Jiang , Junyu Liu , Benji Peng , Sen Zhang , Xuanhe Pan , Jiawei Xu , Jinlang Wang , Keyu Chen , Caitlyn Heqi Yin , Pohsun Feng , Ming Liu

A Tour of TensorFlow

Deep learning is a branch of artificial intelligence employing deep neural network architectures that has significantly advanced the state-of-the-art in computer vision, speech recognition, natural language processing and other domains. In…

Machine Learning · Computer Science 2016-10-06 Peter Goldsborough

Tango: A Deep Neural Network Benchmark Suite for Various Accelerators

Deep neural networks (DNNs) have been proving the effectiveness in various computing fields. To provide more efficient computing platforms for DNN applications, it is essential to have evaluation environments that include assorted benchmark…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-16 Aajna Karki , Chethan Palangotu Keshava , Spoorthi Mysore Shivakumar , Joshua Skow , Goutam Madhukeshwar Hegde , Hyeran Jeon

TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning

Deep learning (DL) compilers rely on cost models and auto-tuning to optimize tensor programs for target hardware. However, existing approaches depend on large offline datasets, incurring high collection costs and offering suboptimal…

Machine Learning · Computer Science 2026-04-15 Chaoyao Shen , Linfeng Jiang , Yixian Shen , Tao Xu , Guoqing Li , Anuj Pathania , Andy D. Pimentel , Meng Zhang