English
Related papers

Related papers: Characterizing Deep-Learning I/O Workloads in Tens…

200 papers

Machine Learning applications on HPC systems have been gaining popularity in recent years. The upcoming large scale systems will offer tremendous parallelism for training through GPUs. However, another heavy aspect of Machine Learning is…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-05 Steven W. D. Chien , Artur Podobas , Ivy B. Peng , Stefano Markidis

Efficient execution of deep learning workloads on dataflow architectures is crucial for overcoming memory bottlenecks and maximizing performance. While streaming intermediate results between computation kernels can significantly improve…

Hardware Architecture · Computer Science 2025-09-24 Hanchen Ye , Deming Chen

Deep learning methods have resulted in significant performance improvements in several application domains and as such several software frameworks have been developed to facilitate their implementation. This paper presents a comparative…

Machine Learning · Computer Science 2016-03-31 Soheil Bahrampour , Naveen Ramakrishnan , Lukas Schott , Mohak Shah

TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML)…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-03 Steven W. D. Chien , Stefano Markidis , Vyacheslav Olshevsky , Yaroslav Bulatov , Erwin Laure , Jeffrey S. Vetter

Deep Learning (DL) frameworks such as PyTorch and TensorFlow include runtime infrastructures responsible for executing trained models on target hardware, managing memory, data transfers, and multi-accelerator execution, if applicable.…

Software Engineering · Computer Science 2024-02-29 Negar Alizadeh , Fernando Castor

Deep learning (DL) applications are increasingly being deployed on HPC systems, to leverage the massive parallelism and computing power of those systems for DL model training. While significant effort has been put to facilitate distributed…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-30 Elvis Rojas , Albert Njoroge Kahira , Esteban Meneses , Leonardo Bautista Gomez , Rosa M Badia

As LLMs and foundation models scale, checkpoint/restore has become a critical pattern for training and inference. With 3D parallelism (tensor, pipeline, data), checkpointing involves many processes, each managing numerous tensors of varying…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-01 Mikaila J. Gossman , Avinash Maurya , Bogdan Nicolae , Jon C. Calhoun

TensorFlow is a popular deep learning framework used by data scientists to solve a wide-range of machine learning and deep learning problems such as image classification and speech recognition. It also operates at a large scale and in…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-06 Niranjan Hasabnis

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of…

Deep Learning (DL) algorithms have become the {\em de facto} choice for data analysis. Several DL implementations -- primarily limited to a single compute node -- such as Caffe, TensorFlow, Theano and Torch have become readily available.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-18 Abhinav Vishnu , Joseph Manzano , Charles Siegel , Jeff Daily

State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using…

Machine Learning · Computer Science 2020-07-01 Yu Emma Wang , Carole-Jean Wu , Xiaodong Wang , Kim Hazelwood , David Brooks

Recursive neural networks have widely been used by researchers to handle applications with recursively or hierarchically structured data. However, embedded control flow deep learning frameworks such as TensorFlow, Theano, Caffe2, and MXNet…

Machine Learning · Computer Science 2018-09-05 Eunji Jeong , Joo Seong Jeong , Soojeong Kim , Gyeong-In Yu , Byung-Gon Chun

Training neural network often uses a machine learning framework such as TensorFlow and Caffe2. These frameworks employ a dataflow model where the NN training is modeled as a directed graph composed of a set of nodes. Operations in neural…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-20 Jiawen Liu , Dong Li , Gokcen Kestor , Jeffrey Vetter

Deep learning is a branch of artificial intelligence employing deep neural network architectures that has significantly advanced the state-of-the-art in computer vision, speech recognition, natural language processing and other domains. In…

Machine Learning · Computer Science 2016-10-06 Peter Goldsborough

State-of-the-art deep learning systems such as TensorFlow and PyTorch tightly couple the model with the underlying hardware. This coupling requires the user to modify application logic in order to run the same job across a different set of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-13 Andrew Or , Haoyu Zhang , Michael J. Freedman

Remote procedure call (RPC) is the backbone of many modern distributed systems. Google's gRPC is one of the most popular open source RPC frameworks available in the community. gRPC is the main communication engine for Google's Deep Learning…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-05 Rajarshi Biswas , Xiaoyi Lu , Dhabaleswar K. Panda

Training deep learning models is a repetitive and resource-intensive process. Data scientists often train several models before landing on a set of parameters (e.g., hyper-parameter tuning) and model architecture (e.g., neural architecture…

Machine Learning · Computer Science 2025-08-04 Ties Robroek , Neil Kim Nielsen , Pınar Tözün

Deep learning (DL) has been widely applied to many domains. Unique challenges in engineering DL systems are posed by the programming paradigm shift from traditional systems to DL systems, and performance is one of the challenges.…

Software Engineering · Computer Science 2022-11-01 Junming Cao , Bihuan Chen , Chao Sun , Longjie Hu , Shuaihong Wu , Xin Peng

Neural network frameworks such as PyTorch and TensorFlow are the workhorses of numerous machine learning applications ranging from object recognition to machine translation. While these frameworks are versatile and straightforward to use,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-24 Nicolas Weber , Florian Schmidt , Mathias Niepert , Felipe Huici

Recently, several JavaScript-based deep learning frameworks have emerged, making it possible to perform deep learning tasks directly in browsers. However, little is known on what and how well we can do with these frameworks for deep…

Software Engineering · Computer Science 2019-03-26 Yun Ma , Dongwei Xiang , Shuyu Zheng , Deyu Tian , Xuanzhe Liu
‹ Prev 1 2 3 10 Next ›