English
Related papers

Related papers: Accelerating PoT Quantization on Edge Devices

200 papers

Power-of-two (PoT) quantization significantly reduces the size of deep neural networks (DNNs) and replaces multiplications with bit-shift operations for inference. Prior work has shown that PoT-quantized DNNs can preserve accuracy for tasks…

Hardware Architecture · Computer Science 2026-05-08 Rappy Saha , Jude Haris , Nicolas Bohm Agostini , David Kaeli , José Cano

Powers-of-two (PoT) quantization reduces the number of bit operations of deep neural networks on resource-constrained hardware. However, PoT quantization triggers a severe accuracy drop because of its limited representation ability. Since…

Computer Vision and Pattern Recognition · Computer Science 2021-03-23 Yuiko Sakuma , Hiroshi Sumihiro , Jun Nishikawa , Toshiki Nakamura , Ryoji Ikegaya

Deep neural networks virtually dominate the domain of most modern vision systems, providing high performance at a cost of increased computational complexity.Since for those systems it is often required to operate both in real-time and with…

Computer Vision and Pattern Recognition · Computer Science 2023-11-14 Dominika Przewlocka-Rus , Tomasz Kryjak

Deploying Deep Neural Networks in low-power embedded devices for real time-constrained applications requires optimization of memory and computational complexity of the networks, usually by quantizing the weights. Most of the existing works…

Machine Learning · Computer Science 2022-03-11 Dominika Przewlocka-Rus , Syed Shakib Sarwar , H. Ekin Sumbul , Yuecheng Li , Barbara De Salvo

In Large Language Models (LLMs), the number of parameters has grown exponentially in the past few years, e.g., from 1.5 billion parameters in GPT-2 to 175 billion in GPT-3 to possibly more than trillion in higher versions. This raises a…

Computation and Language · Computer Science 2026-01-06 Mahmoud Elgenedy

Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, FPGAs, and embedded…

Machine Learning · Computer Science 2020-12-15 Sung-En Chang , Yanyu Li , Mengshu Sun , Runbin Shi , Hayden K. -H. So , Xuehai Qian , Yanzhi Wang , Xue Lin

Conventional multiply-accumulate (MAC) operations have long dominated computation time for deep neural networks (DNNs), espcially convolutional neural networks (CNNs). Recently, product quantization (PQ) has been applied to these workloads,…

Hardware Architecture · Computer Science 2024-04-01 Ahmed F. AbouElhamayed , Angela Cui , Javier Fernandez-Marques , Nicholas D. Lane , Mohamed S. Abdelfattah

The large computing and memory cost of deep neural networks (DNNs) often precludes their use in resource-constrained devices. Quantizing the parameters and operations to lower bit-precision offers substantial memory and energy savings for…

Machine Learning · Computer Science 2023-09-01 Clemens JS Schaefer , Siddharth Joshi , Shan Li , Raul Blazquez

Large Language Models (LLMs) have demonstrated remarkable performance across various natural language processing (NLP) tasks. However, their deployment is challenging due to the substantial computational resources required. Power-of-two…

Computation and Language · Computer Science 2025-07-17 Xinyu Wang , Vahid Partovi Nia , Peng Lu , Jerry Huang , Xiao-Wen Chang , Boxing Chen , Yufei Cui

Deep Learning Architectures employ heavy computations and bulk of the computational energy is taken up by the convolution operations in the Convolutional Neural Networks. The objective of our proposed work is to reduce the energy…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-17 Salman Abdul Khaliq , Rehan Hafiz

It is usually infeasible to fit and train an entire large deep neural network (DNN) model using a single edge device due to the limited resources. To facilitate intelligent applications across edge devices, researchers have proposed…

Machine Learning · Computer Science 2023-11-13 Yuhao Chen , Yuxuan Yan , Qianqian Yang , Yuanchao Shu , Shibo He , Zhiguo Shi , Jiming Chen

Traditional Deep Neural Network (DNN) quantization methods using integer, fixed-point, or floating-point data types struggle to capture diverse DNN parameter distributions at low precision, and often require large silicon overhead and…

Hardware Architecture · Computer Science 2024-03-28 Akshat Ramachandran , Zishen Wan , Geonhwa Jeong , John Gustafson , Tushar Krishna

Neural network quantization aims to reduce the bit-widths of weights and activations, making it a critical technique for deploying deep neural networks on resource-constrained hardware. Most Quantization-Aware Training (QAT) methods rely on…

Machine Learning · Computer Science 2025-09-03 Kaiqi Zhao

Deep neural networks (DNNs) are ubiquitous in computer vision and natural language processing, but suffer from high inference cost. This problem can be addressed by quantization, which consists in converting floating point perations into a…

Computer Vision and Pattern Recognition · Computer Science 2023-11-28 Edouard Yvinec , Arnaud Dapogny , Kevin Bailly

This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage. In order to alleviate the computation and storage burdens,…

Machine Learning · Computer Science 2019-01-09 Xue Geng , Jie Fu , Bin Zhao , Jie Lin , Mohamed M. Sabry Aly , Christopher Pal , Vijay Chandrasekhar

As the machine learning and systems communities strive to achieve higher energy-efficiency through custom deep neural network (DNN) accelerators, varied bit precision or quantization levels, there is a need for design space exploration…

Hardware Architecture · Computer Science 2022-05-27 Ahmet Inci , Siri Garudanagiri Virupaksha , Aman Jain , Venkata Vivek Thallam , Ruizhou Ding , Diana Marculescu

With the surging popularity of edge computing, the need to efficiently perform neural network inference on battery-constrained IoT devices has greatly increased. While algorithmic developments enable neural networks to solve increasingly…

Hardware Architecture · Computer Science 2022-06-27 Maarten Molendijk , Floran de Putter , Henk Corporaal

Deep neural networks (DNN) have achieved impressive success in multiple domains. Over the years, the accuracy of these models has increased with the proliferation of deeper and more complex architectures. Thus, state-of-the-art solutions…

Sound · Computer Science 2022-07-18 Anderson R. Avila , Khalil Bibi , Rui Heng Yang , Xinlin Li , Chao Xing , Xiao Chen

The number of processing elements (PEs) in a fixed-sized systolic accelerator is well matched for large and compute-bound DNNs; whereas, memory-bound DNNs suffer from PE underutilization and fail to achieve peak performance and energy…

Signal Processing · Electrical Eng. & Systems 2020-06-29 Nandan Kumar Jha , Shreyas Ravishankar , Sparsh Mittal , Arvind Kaushik , Dipan Mandal , Mahesh Chandra

Deep neural networks (DNN) are increasingly being accelerated on application-specific hardware such as the Google TPU designed especially for deep learning. Timing speculation is a promising approach to further increase the energy…

Machine Learning · Computer Science 2018-07-03 Jeff Zhang , Siddharth Garg
‹ Prev 1 2 3 10 Next ›