English
Related papers

Related papers: Bit-Parallel Vector Composability for Neural Accel…

200 papers

Fully realizing the potential of acceleration for Deep Neural Networks (DNNs) requires understanding and leveraging algorithmic properties. This paper builds upon the algorithmic insight that bitwidth of operations in DNNs can be reduced…

Neural and Evolutionary Computing · Computer Science 2018-05-31 Hardik Sharma , Jongse Park , Naveen Suda , Liangzhen Lai , Benson Chau , Joon Kyung Kim , Vikas Chandra , Hadi Esmaeilzadeh

Bit-serial architectures can handle Neural Networks (NNs) with different weight precisions, achieving higher resource efficiency compared with bit-parallel architectures. Besides, the weights contain abundant zero bits owing to the fault…

Hardware Architecture · Computer Science 2023-02-02 Wenhao Sun , Zhiwei Zou , Deng Liu , Wendi Sun , Song Chen , Yi Kang

Despite foreseeing tremendous speedups over conventional deep neural networks, the performance advantage of binarized neural networks (BNNs) has merely been showcased on general-purpose processors such as CPUs and GPUs. In fact, due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-16 Ang Li , Simon Su

The creation of practical deep learning data-products often requires parallelization across processors and computers to make deep learning feasible on large data sets, but bottlenecks in communication bandwidth make it difficult to attain…

Neural and Evolutionary Computing · Computer Science 2016-02-22 Tim Dettmers

Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their…

Hardware Architecture · Computer Science 2021-06-25 Petar Jokic , Erfan Azarkhish , Andrea Bonetti , Marc Pons , Stephane Emery , Luca Benini

Transformer models have revolutionized AI tasks, but their large size hinders real-world deployment on resource-constrained and latency-critical edge devices. While binarized Transformers offer a promising solution by significantly reducing…

Machine Learning · Computer Science 2025-05-13 Yuhao Ji , Chao Fang , Shaobo Ma , Haikuo Shao , Zhongfeng Wang

By supporting the access of multiple memory words at the same time, Bit-line Computing (BC) architectures allow the parallel execution of bit-wise operations in-memory. At the array periphery, arithmetic operations are then derived with…

Hardware Architecture · Computer Science 2022-09-14 Marco Rios , Flavio Ponzina , Alexandre Levisse , Giovanni Ansaloni , David Atienza

Neural architectures and hardware accelerators have been two driving forces for the progress in deep learning. Previous works typically attempt to optimize hardware given a fixed model architecture or model architecture given fixed…

FPGA-based hardware accelerators for convolutional neural networks (CNNs) have obtained great attentions due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-06-09 Yixing Li , Zichuan Liu , Kai Xu , Hao Yu , Fengbo Ren

Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Kingshuk Majumder , Shubham Nema , Uday Bondhugula

Recent advancements in neural rendering technologies and their supporting devices have paved the way for immersive 3D experiences, significantly transforming human interaction with intelligent devices across diverse applications. However,…

Graphics · Computer Science 2025-04-01 Chaojian Li , Sixu Li , Linrui Jiang , Jingqun Zhang , Yingyan Celine Lin

Recent research has shown that large language models (LLMs) can utilize low-precision floating point (FP) quantization to deliver high efficiency while maintaining original model accuracy. In particular, recent works have shown the…

Hardware Architecture · Computer Science 2025-06-05 Faraz Tahmasebi , Yian Wang , Benji Y. H. Huang , Hyoukjun Kwon

Convolutional Neural Networks (CNNs) reach high accuracies in various application domains, but require large amounts of computation and incur costly data movements. One method to decrease these costs while trading accuracy is weight and/or…

Hardware Architecture · Computer Science 2022-08-10 Cecilia Latotzke , Tim Ciesielski , Tobias Gemmeke

While Neural Processing Units (NPUs) offer high theoretical efficiency for edge AI, state-of-the-art Vision--Language Models (VLMs) tailored for GPUs often falter on these substrates. We attribute this hardware-model mismatch to two primary…

Computation and Language · Computer Science 2025-12-09 Wei Chen , Liangmin Wu , Yunhai Hu , Zhiyuan Li , Zhiyuan Cheng , Yicheng Qian , Lingyue Zhu , Zhipeng Hu , Luoyi Liang , Qiang Tang , Zhen Liu , Han Yang

Deploying Large Language Models (LLMs) on resource-constrained edge devices faces critical bottlenecks in memory bandwidth and power consumption. While ternary quantization (e.g., BitNet b1.58) significantly reduces model size, its direct…

Hardware Architecture · Computer Science 2026-05-05 Zi-Wei Lin , Tian-Sheuan Chang

Convolutional Neural Networks (CNNs) are widely used in deep learning applications, e.g. visual systems, robotics etc. However, existing software solutions are not efficient. Therefore, many hardware accelerators have been proposed…

Machine Learning · Computer Science 2021-09-08 Sasindu Wijeratne , Sandaruwan Jayaweera , Mahesh Dananjaya , Ajith Pasqual

Neural networks are increasingly used in real-time systems, such as automated driving applications. This requires high-performance hardware with predictable timing behavior. State-of-the-art real-time hardware is limited in memory and…

Hardware Architecture · Computer Science 2024-10-15 Maximilian Kirschner , Konstantin Dudzik , Jürgen Becker

CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-24 Jason Cong , Peng Wei , Cody Hao Yu , Peng Zhang

Vector similarity search is a critical component of modern AI systems, but traditional CPU-based implementations face fundamental scalability bottlenecks for billion-scale corpora due to prohibitive computational overhead and memory…

We propose Booster, a novel accelerator for gradient boosting trees based on the unique characteristics of gradient boosting models. We observe that the dominant steps of gradient boosting training (accounting for 90-98% of training time)…

Hardware Architecture · Computer Science 2020-11-06 Mingxuan He , T. N. Vijaykumar , Mithuna Thottethodi
‹ Prev 1 2 3 10 Next ›