English
Related papers

Related papers: Optimizing CNN Model Inference on CPUs

200 papers

Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. Due to the substantial compute and memory operations, however, they require significant execution time. The massive…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-13 Chao Li , Yi Yang , Min Feng , Srimat Chakradhar , Huiyang Zhou

Modern deep learning applications urge to push the model inference taking place at the edge devices for multiple reasons such as achieving shorter latency, relieving the burden of the network connecting to the cloud, and protecting user…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-07-05 Leyuan Wang , Zhi Chen , Yizhi Liu , Yao Wang , Lianmin Zheng , Mu Li , Yida Wang

This paper optimizes the Convolutional Neural Network (CNN) algorithm using high-performance computing (HPC) technologies. It uses multi-core processors, GPUs, and parallel computing frameworks like OpenMPI and CUDA to speed up CNN model…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-11 Shahrin Rahman

AI spans from large language models to tiny models running on microcontrollers (MCUs). Extremely memory-efficient model architectures are decisive to fit within an MCU's tiny memory budget e.g., 128kB of RAM. However, inference latency must…

Machine Learning · Computer Science 2025-10-20 Zhaolan Huang , Emmanuel Baccelli

Deploying deep learning models in cloud clusters provides efficient and prompt inference services to accommodate the widespread application of deep learning. These clusters are usually equipped with host CPUs and accelerators with distinct…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-24 Zinuo Cai , Hao Wang , Tao Song , Yang Hua , Ruhui Ma , Haibing Guan

Convolutional Neural Networks (CNNs) are currently adopted to solve an ever greater number of problems, ranging from speech recognition to image classification and segmentation. The large amount of processing required by CNNs calls for…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-06-06 Kamel Abdelouahab , Maxime Pelcat , Jocelyn Serot , François Berry

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when…

Large language models (LLMs) have shown exceptional performance and vast potential across diverse tasks. However, the deployment of LLMs with high performance in low-resource environments has garnered significant attention in the industry.…

Artificial Intelligence · Computer Science 2024-07-11 Pujiang He , Shan Zhou , Wenhuan Huang , Changqing Li , Duyi Wang , Bin Guo , Chen Meng , Sheng Gui , Weifei Yu , Yi Xie

The server central processing unit (CPU) market continues to exhibit robust demand due to the rising global need for computing power. Against this backdrop, CPU benchmark performance prediction is crucial for architecture designers. It…

Performance · Computer Science 2024-10-29 Xiaoman Liu

In recent years, large language models have demonstrated remarkable performance across various natural language processing (NLP) tasks. However, deploying these models for real-world applications often requires efficient inference solutions…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-13 Ditto PS , Jithin VG , Adarsh MS

Running deep neural networks on microcontroller units (MCUs) is severely constrained by limited memory resources. While TinyML techniques reduce model size and computation, they often fail in practice due to excessive peak Random Access…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-12 Junyu Lu , Shashwath Suresh , Hao Liu , Qi Hong , Qing Wang

In this paper, we propose different alternatives for convolutional neural networks (CNNs) segmentation, addressing inference processes on computing architectures composed by multiple Edge TPUs. Specifically, we compare the inference…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-04 Jorge Villarrubia , Luis Costero , Francisco D. Igual , Katzalin Olcoz

Automated design methods for convolutional neural networks (CNNs) have recently been developed in order to increase the design productivity. We propose a neuroevolution method capable of evolving and optimizing CNNs with respect to the…

Neural and Evolutionary Computing · Computer Science 2019-10-16 Filip Badan , Lukas Sekanina

In this work we present a new framework for neural networks compression with fine-tuning, which we called Neural Network Compression Framework (NNCF). It leverages recent advances of various network compression methods and implements some…

Computer Vision and Pattern Recognition · Computer Science 2021-01-01 Alexander Kozlov , Ivan Lazarevich , Vasily Shamporov , Nikolay Lyalyushkin , Yury Gorbachev

Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-28 Marc Jordà , Pedro Valero-Lara , Antonio J. Peña

Deep Convolutional Neural Networks (CNNs) are widely employed in modern computer vision algorithms, where the input image is convolved iteratively by many kernels to extract the knowledge behind it. However, with the depth of convolutional…

Computer Vision and Pattern Recognition · Computer Science 2018-04-11 Chih-Ting Liu , Yi-Heng Wu , Yu-Sheng Lin , Shao-Yi Chien

This paper presents the Neural Cache architecture, which re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks. Techniques to do in-situ arithmetic in…

Hardware Architecture · Computer Science 2018-05-11 Charles Eckert , Xiaowei Wang , Jingcheng Wang , Arun Subramaniyan , Ravi Iyer , Dennis Sylvester , David Blaauw , Reetuparna Das

Convolutional Neural Networks (CNNs) have shown a great deal of success in diverse application domains including computer vision, speech recognition, and natural language processing. However, as the size of datasets and the depth of neural…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-12-07 Wonje Choi , Karthi Duraisamy , Ryan Gary Kim , Janardhan Rao Doppa , Partha Pratim Pande , Diana Marculescu , Radu Marculescu

Custom dataflow Convolutional Neural Network (CNN) inference accelerators on FPGA are tailored to a specific CNN topology and store parameters in On-Chip Memory (OCM), resulting in high energy efficiency and low inference latency. However,…

Hardware Architecture · Computer Science 2020-11-17 Lucian Petrica , Tobias Alonso , Mairin Kroes , Nicholas Fraser , Sorin Cotofana , Michaela Blott

Convolutional Neural Networks are extensively used in a wide range of applications, commonly including computer vision tasks like image and video classification, recognition, and segmentation. Recent research results demonstrate that…

Signal Processing · Electrical Eng. & Systems 2020-05-11 Marco Carreras , Gianfranco Deriu , Luigi Raffo , Luca Benini , Paolo Meloni
‹ Prev 1 2 3 10 Next ›