English
Related papers

Related papers: Accelerating Machine Learning Inference with GPUs …

200 papers

Machine learning algorithms are becoming increasingly prevalent and performant in the reconstruction of events in accelerator-based neutrino experiments. These sophisticated algorithms can be computationally expensive. At the same time, the…

GPU-accelerated computing is a key technology to realize high-speed inference servers using deep neural networks (DNNs). An important characteristic of GPU-based inference is that the computational efficiency, in terms of the processing…

Performance · Computer Science 2021-01-13 Yoshiaki Inoue

New heterogeneous computing paradigms on dedicated hardware with increased parallelization, such as Field Programmable Gate Arrays (FPGAs), offer exciting solutions with large potential gains. The growing applications of machine learning…

Edge computing's growing prominence, due to its ability to reduce communication latency and enable real-time processing, is promoting the rise of high-performance, heterogeneous System-on-Chip solutions. While current approaches often…

Artificial Intelligence · Computer Science 2024-09-24 Rakshith Jayanth , Neelesh Gupta , Viktor Prasanna

Video and image streaming on edge devices requires low latency. To address this, Neural Networks (NNs) are widely used, and prior work mainly focuses on accelerating them with single hardware units such as Graphics Processing Units (GPUs),…

Hardware Architecture · Computer Science 2026-05-04 Ali Emre Oztas , Mahir Demir , James Garside , Mikel Luj'an

In the next decade, the demands for computing in large scientific experiments are expected to grow tremendously. During the same time period, CPU performance increases will be limited. At the CERN Large Hadron Collider (LHC), these two…

Intensive computation is entering data centers with multiple workloads of deep learning. To balance the compute efficiency, performance, and total cost of ownership (TCO), the use of a field-programmable gate array (FPGA) with…

Computer Vision and Pattern Recognition · Computer Science 2019-09-19 Xiaoyu Yu , Yuwei Wang , Jie Miao , Ephrem Wu , Heng Zhang , Yu Meng , Bo Zhang , Biao Min , Dewei Chen , Jianlin Gao

Standardized DNN models that have been proved to perform well on machine learning tasks are widely used and often adopted as-is to solve downstream tasks, forming the transfer learning paradigm. However, when serving multiple instances of…

Machine Learning · Computer Science 2020-09-29 Joo Seong Jeong , Soojeong Kim , Gyeong-In Yu , Yunseong Lee , Byung-Gon Chun

Deploying deep learning models in cloud clusters provides efficient and prompt inference services to accommodate the widespread application of deep learning. These clusters are usually equipped with host CPUs and accelerators with distinct…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-24 Zinuo Cai , Hao Wang , Tao Song , Yang Hua , Ruhui Ma , Haibing Guan

With the fast development of deep neural networks (DNNs), many real-world applications are adopting multiple models to conduct compound tasks, such as co-running classification, detection, and segmentation models on autonomous vehicles.…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-30 Fuxun Yu , Shawn Bray , Di Wang , Longfei Shangguan , Xulong Tang , Chenchen Liu , Xiang Chen

Graph neural network (GNN) inference faces significant bottlenecks in preprocessing, which often dominate overall inference latency. We introduce AutoGNN, an FPGA-based accelerator designed to address these challenges by leveraging FPGA's…

In recent years, large language models have demonstrated remarkable performance across various natural language processing (NLP) tasks. However, deploying these models for real-world applications often requires efficient inference solutions…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-13 Ditto PS , Jithin VG , Adarsh MS

Accelerating the deep learning inference is very important for real-time applications. In this paper, we propose a novel method to fuse the layers of convolutional neural networks (CNNs) on Graphics Processing Units (GPUs), which applies…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-30 Xueying Wang , Guangli Li , Xiao Dong , Jiansong Li , Lei Liu , Xiaobing Feng

Graphics processing units (GPUs) can improve deep neural network inference throughput via batch processing, where multiple tasks are concurrently processed. We focus on novel scenarios that the energy-constrained mobile devices offload…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-14 Wenqi Shi , Sheng Zhou , Zhisheng Niu , Miao Jiang , Lu Geng

Graph neural networks (GNNs) have recently exploded in popularity thanks to their broad applicability to graph-related problems such as quantum chemistry, drug discovery, and high energy physics. However, meeting demand for novel GNN models…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-20 Rishov Sarkar , Stefan Abi-Karam , Yuqi He , Lakshmi Sathidevi , Cong Hao

Cloud GPU servers have become the de facto way for deep learning practitioners to train complex models on large-scale datasets. However, it is challenging to determine the appropriate cluster configuration---e.g., server type and…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-08 Shijian Li , Robert J. Walls , Tian Guo

Ensembles of Deep Neural Networks (DNNs) have achieved qualitative predictions but they are computing and memory intensive. Therefore, the demand is growing to make them answer a heavy workload of requests with available computational…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-31 Pierrick Pochelu , Serge G. Petiton , Bruno Conche

This paper studies inference acceleration using distributed convolutional neural networks (CNNs) in collaborative edge computing. To ensure inference accuracy in inference task partitioning, we consider the receptive-field when performing…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-12 Nan Li , Alexandros Iosifidis , Qi Zhang

The demonstrated success of transfer learning has popularized approaches that involve pretraining models from massive data sources and subsequent finetuning towards a specific task. While such approaches have become the norm in fields such…

Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-28 Marc Jordà , Pedro Valero-Lara , Antonio J. Peña
‹ Prev 1 2 3 10 Next ›