English
Related papers

Related papers: A Metaprogramming and Autotuning Framework for Dep…

200 papers

Deep learning frameworks have been widely deployed on GPU servers for deep learning applications in both academia and industry. In training deep neural networks (DNNs), there are many standard processes or algorithms, such as convolution…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-21 Shaohuai Shi , Qiang Wang , Xiaowen Chu

Deep learning technologies, particularly deep neural networks (DNNs), have demonstrated significant success across many domains. This success has been accompanied by substantial advancements and innovations in the algorithms behind the…

Machine Learning · Computer Science 2025-04-14 Timothy L. Cronin , Sanmukh Kuppannagari

GPUs are used for training, inference, and tuning the machine learning models. However, Deep Neural Network (DNN) vary widely in their ability to exploit the full power of high-performance GPUs. Spatial sharing of GPU enables multiplexing…

Neural and Evolutionary Computing · Computer Science 2020-08-11 Aditya Dhakal , Junguk Cho , Sameer G. Kulkarni , K. K. Ramakrishnan , Puneet Sharma

Almost in every heavily computation-dependent application, from 6G communication systems to autonomous driving platforms, a large portion of computing should be near to the client side. Edge computing (AI at Edge) in mobile devices is one…

Hardware Architecture · Computer Science 2024-07-29 Seyed Nima Omidsajedi , Rekha Reddy , Jianming Yi , Jan Herbst , Christoph Lipps , Hans Dieter Schotten

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently used in deep learning. Specifically, cuDNN implements several equivalent convolution algorithms, whose performance and memory footprint may vary considerably,…

Machine Learning · Computer Science 2018-04-16 Yosuke Oyama , Tal Ben-Nun , Torsten Hoefler , Satoshi Matsuoka

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when…

GPUs are currently the platform of choice for training neural networks. However, training a deep neural network (DNN) is a time-consuming process even on GPUs because of the massive number of parameters that have to be learned. As a result,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-29 Behnam Pourghassemi , Chenghao Zhang , Joo Hwan Lee , Aparna Chandramowlishwaran

This paper presents a unified framework for codifying and automating optimization strategies to efficiently deploy deep neural networks (DNNs) on resource-constrained hardware, such as FPGAs, while maintaining high performance, accuracy,…

Hardware Architecture · Computer Science 2026-02-11 Zhiqiang Que , Jose G. F. Coutinho , Ce Guo , Hongxiang Fan , Wayne Luk

The training process of Deep Neural Network (DNN) is compute-intensive, often taking days to weeks to train a DNN model. Therefore, parallel execution of DNN training on GPUs is a widely adopted approach to speed up the process nowadays.…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-29 Chi-Chung Chen , Chia-Lin Yang , Hsiang-Yun Cheng

Distributed deep neural networks (DNNs) have become central to modern computer vision, yet their deployment on resource-constrained edge devices remains hindered by substantial parameter counts, computational demands, and the probability of…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-17 Mahadev Sunil Kumar , Arnab Raha , Debayan Das , Gopakumar G , Rounak Chatterjee , Amitava Mukherjee

Deep learning has become widely used in complex AI applications. Yet, training a deep neural network (DNNs) model requires a considerable amount of calculations, long running time, and much energy. Nowadays, many-core AI accelerators (e.g.,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-12 Yuxin Wang , Qiang Wang , Shaohuai Shi , Xin He , Zhenheng Tang , Kaiyong Zhao , Xiaowen Chu

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Deep neural networks (DNNs) have the advantage that they can take into account a large number of parameters, which enables them to solve complex tasks. In computer vision and speech recognition, they have a better accuracy than common…

Machine Learning · Computer Science 2021-04-20 Lukas Baischer , Matthias Wess , Nima TaheriNejad

Deep neural networks (DNNs) have been ubiquitously applied in many applications, and accelerators are emerged as an enabler to support the fast and efficient inference tasks of these applications. However, to achieve high model coverage…

Machine Learning · Computer Science 2021-05-10 Zhi Chen , Cody Hao Yu , Trevor Morris , Jorn Tuyls , Yi-Hsiang Lai , Jared Roesch , Elliott Delaye , Vin Sharma , Yida Wang

Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI) applications including computer vision, speech recognition, and robotics. While DNNs deliver state-of-the-art accuracy on many AI tasks, it comes at…

Computer Vision and Pattern Recognition · Computer Science 2017-08-15 Vivienne Sze , Yu-Hsin Chen , Tien-Ju Yang , Joel Emer

Deep neural networks (DNNs) have succeeded in many different perception tasks, e.g., computer vision, natural language processing, reinforcement learning, etc. The high-performed DNNs heavily rely on intensive resource consumption. For…

Machine Learning · Computer Science 2022-10-10 Zhongnan Qu

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition. However, their superior performance comes at the…

Machine Learning · Computer Science 2022-04-26 Han Cai , Ji Lin , Yujun Lin , Zhijian Liu , Haotian Tang , Hanrui Wang , Ligeng Zhu , Song Han

Nowadays, we are living in an era of extreme device heterogeneity. Despite the high variety of conventional CPU architectures, accelerator devices, such as GPUs and FPGAs, also appear in the foreground exploding the pool of available…

Machine Learning · Computer Science 2022-08-31 Petros Vavaroutsos , Ioannis Oroutzoglou , Dimosthenis Masouros , Dimitrios Soudris

As the emerging trend of graph-based deep learning, Graph Neural Networks (GNNs) excel for their capability to generate high-quality node feature vectors (embeddings). However, the existing one-size-fits-all GNN implementations are…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-22 Yuke Wang , Boyuan Feng , Gushu Li , Shuangchen Li , Lei Deng , Yuan Xie , Yufei Ding

Attention Graph Neural Networks (AT-GNNs), such as GAT and Graph Transformer, have demonstrated superior performance compared to other GNNs. However, existing GNN systems struggle to efficiently train AT-GNNs on GPUs due to their intricate…

Machine Learning · Computer Science 2024-11-26 Jiahui Liu , Zhenkun Cai , Zhiyong Chen , Minjie Wang
‹ Prev 1 2 3 10 Next ›