Related papers: A Data-Centric Optimization Framework for Machine …

OpTorch: Optimized deep learning architectures for resource limited environments

Deep learning algorithms have made many breakthroughs and have various applications in real life. Computational resources become a bottleneck as the data and complexity of the deep learning pipeline increases. In this paper, we propose…

Machine Learning · Computer Science 2021-05-05 Salman Ahmed , Hammad Naveed

Resource-Efficient Deep Learning: A Survey on Model-, Arithmetic-, and Implementation-Level Techniques

Deep learning is pervasive in our daily life, including self-driving cars, virtual assistants, social network services, healthcare services, face recognition, etc. However, deep neural networks demand substantial compute resources during…

Machine Learning · Computer Science 2024-04-30 JunKyu Lee , Lev Mukhanov , Amir Sabbagh Molahosseini , Umar Minhas , Yang Hua , Jesus Martinez del Rincon , Kiril Dichev , Cheol-Ho Hong , Hans Vandierendonck

Where Is My Training Bottleneck? Hidden Trade-Offs in Deep Learning Preprocessing Pipelines

Preprocessing pipelines in deep learning aim to provide sufficient data throughput to keep the training processes busy. Maximizing resource utilization is becoming more challenging as the throughput of training processes increases with…

Machine Learning · Computer Science 2022-03-28 Alexander Isenko , Ruben Mayer , Jeffrey Jedele , Hans-Arno Jacobsen

cedar: Optimized and Unified Machine Learning Input Data Pipelines

The input data pipeline is an essential component of each machine learning (ML) training job. It is responsible for reading massive amounts of training data, processing batches of samples using complex transformations, and loading them onto…

Machine Learning · Computer Science 2024-11-28 Mark Zhao , Emanuel Adamiak , Christos Kozyrakis

Profiling and Improving the PyTorch Dataloader for high-latency Storage: A Technical Report

A growing number of Machine Learning Frameworks recently made Deep Learning accessible to a wider audience of engineers, scientists, and practitioners, by allowing straightforward use of complex neural network architectures and algorithms.…

Machine Learning · Computer Science 2022-12-08 Ivan Svogor , Christian Eichenberger , Markus Spanring , Moritz Neun , Michael Kopp

Learning with Differentiable Perturbed Optimizers

Machine learning pipelines often rely on optimization procedures to make discrete decisions (e.g., sorting, picking closest neighbors, or shortest paths). Although these discrete decisions are easily computed, they break the…

Machine Learning · Computer Science 2020-06-11 Quentin Berthet , Mathieu Blondel , Olivier Teboul , Marco Cuturi , Jean-Philippe Vert , Francis Bach

Deep Learning Models on CPUs: A Methodology for Efficient Training

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when…

Machine Learning · Computer Science 2023-06-21 Quchen Fu , Ramesh Chukka , Keith Achorn , Thomas Atta-fosu , Deepak R. Canchi , Zhongwei Teng , Jules White , Douglas C. Schmidt

DeepConfig: Automating Data Center Network Topologies Management with Machine Learning

In recent years, many techniques have been developed to improve the performance and efficiency of data center networks. While these techniques provide high accuracy, they are often designed using heuristics that leverage domain-specific…

Networking and Internet Architecture · Computer Science 2017-12-13 Christopher Streiffer , Huan Chen , Theophilus Benson , Asim Kadav

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-03 Shiqing Fan , Yi Rong , Chen Meng , Zongyan Cao , Siyu Wang , Zhen Zheng , Chuan Wu , Guoping Long , Jun Yang , Lixue Xia , Lansong Diao , Xiaoyong Liu , Wei Lin

Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better

Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. However, with the progressive improvements in deep learning models, their number of…

Machine Learning · Computer Science 2024-04-17 Gaurav Menghani

Practical Performance Guarantees for Pipelined DNN Inference

We optimize pipeline parallelism for deep neural network (DNN) inference by partitioning model graphs into $k$ stages and minimizing the running time of the bottleneck stage, including communication. We give practical and effective…

Machine Learning · Computer Science 2024-06-05 Aaron Archer , Matthew Fahrbach , Kuikui Liu , Prakash Prabhu

PyTorchPipe: a framework for rapid prototyping of pipelines combining language and vision

Access to vast amounts of data along with affordable computational power stimulated the reincarnation of neural networks. The progress could not be achieved without adequate software tools, lowering the entry bar for the next generations of…

Machine Learning · Computer Science 2019-10-22 Tomasz Kornuta

A Reinforcement-Learning-Based Energy-Efficient Framework for Multi-Task Video Analytics Pipeline

Deep-learning-based video processing has yielded transformative results in recent years. However, the video analytics pipeline is energy-intensive due to high data rates and reliance on complex inference algorithms, which limits its…

Computer Vision and Pattern Recognition · Computer Science 2021-05-04 Yingying Zhao , Mingzhi Dong , Yujiang Wang , Da Feng , Qin Lv , Robert P. Dick , Dongsheng Li , Tun Lu , Ning Gu , Li Shang

ESPnet-ONNX: Bridging a Gap Between Research and Production

In the field of deep learning, researchers often focus on inventing novel neural network models and improving benchmarks. In contrast, application developers are interested in making models suitable for actual products, which involves…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-15 Masao Someki , Yosuke Higuchi , Tomoki Hayashi , Shinji Watanabe

Optimising Resource Management for Embedded Machine Learning

Machine learning inference is increasingly being executed locally on mobile and embedded platforms, due to the clear advantages in latency, privacy and connectivity. In this paper, we present approaches for online resource management in…

Computer Vision and Pattern Recognition · Computer Science 2021-05-11 Lei Xun , Long Tran-Thanh , Bashir M Al-Hashimi , Geoff V. Merrett

PipeOrgan: Efficient Inter-operation Pipelining with Flexible Spatial Organization and Interconnects

Because of the recent trends in Deep Neural Networks (DNN) models being memory-bound, inter-operator pipelining for DNN accelerators is emerging as a promising optimization. Inter-operator pipelining reduces costly on-chip global memory and…

Hardware Architecture · Computer Science 2024-05-06 Raveesh Garg , Hyoukjun Kwon , Eric Qin , Yu-Hsin Chen , Tushar Krishna , Liangzhen Lai

A Framework to Enable Algorithmic Design Choice Exploration in DNNs

Deep learning technologies, particularly deep neural networks (DNNs), have demonstrated significant success across many domains. This success has been accompanied by substantial advancements and innovations in the algorithms behind the…

Machine Learning · Computer Science 2025-04-14 Timothy L. Cronin , Sanmukh Kuppannagari

Optimizing High-Throughput Distributed Data Pipelines for Reproducible Deep Learning at Scale

Training massive-scale deep learning models on datasets spanning tens of terabytes presents critical challenges in hardware utilization and training reproducibility. In this paper, we identify and resolve profound data-loading bottlenecks…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-24 Kashish Mittal , Di Yu , Roozbeh Ketabi , Arushi Arora , Brendon Lapp , Peng Zhang

Modyn: Data-Centric Machine Learning Pipeline Orchestration

In real-world machine learning (ML) pipelines, datasets are continuously growing. Models must incorporate this new training data to improve generalization and adapt to potential distribution shifts. The cost of model retraining is…

Machine Learning · Computer Science 2025-01-27 Maximilian Böther , Ties Robroek , Viktor Gsteiger , Robin Holzinger , Xianzhe Ma , Pınar Tözün , Ana Klimovic

PipeDream: Fast and Efficient Pipeline Parallel DNN Training

PipeDream is a Deep Neural Network(DNN) training system for GPUs that parallelizes computation by pipelining execution across multiple machines. Its pipeline parallel computing model avoids the slowdowns faced by data-parallel training when…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-12 Aaron Harlap , Deepak Narayanan , Amar Phanishayee , Vivek Seshadri , Nikhil Devanur , Greg Ganger , Phil Gibbons