Related papers: Near Data Acceleration with Concurrent Host Access

CODA: Enabling Co-location of Computation and Data for Near-Data Processing

Recent studies have demonstrated that near-data processing (NDP) is an effective technique for improving performance and energy efficiency of data-intensive workloads. However, leveraging NDP in realistic systems with multiple memory…

Hardware Architecture · Computer Science 2018-12-05 Hyojong Kim , Ramyad Hadidi , Lifeng Nai , Hyesoon Kim , Nuwan Jayasena , Yasuko Eckert , Onur Kayiran , Gabriel H. Loh

SeDA: Secure and Efficient DNN Accelerators with Hardware/Software Synergy

Ensuring the confidentiality and integrity of DNN accelerators is paramount across various scenarios spanning autonomous driving, healthcare, and finance. However, current security approaches typically require extensive hardware resources,…

Hardware Architecture · Computer Science 2025-08-27 Wei Xuan , Zhongrui Wang , Lang Feng , Ning Lin , Zihao Xuan , Rongliang Fu , Tsung-Yi Ho , Yuzhong Jiao , Luhong Liang

Towards Secure and Efficient DNN Accelerators via Hardware-Software Co-Design

The rapid deployment of deep neural network (DNN) accelerators in safety-critical domains such as autonomous vehicles, healthcare systems, and financial infrastructure necessitates robust mechanisms to safeguard data confidentiality and…

Cryptography and Security · Computer Science 2026-02-25 Wei Xuan , Zihao Xuan , Rongliang Fu , Ning Lin , Kwunhang Wong , Zikang Yuan , Lang Feng , Zhongrui Wang , Tsung-Yi Ho , Yuzhong Jiao , Luhong Liang

An Energy-Efficient Near-Data Processing Accelerator for DNNs that Optimizes Data Accesses

The constant growth of DNNs makes them challenging to implement and run efficiently on traditional compute-centric architectures. Some accelerators have attempted to add more compute units and on-chip buffers to solve the memory wall…

Hardware Architecture · Computer Science 2023-10-30 Bahareh Khabbazan , Marc Riera , Antonio González

A Survey of Near-Data Processing Architectures for Neural Networks

Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key…

Hardware Architecture · Computer Science 2021-12-24 Mehdi Hassanpour , Marc Riera , Antonio González

Speed-ANN: Low-Latency and High-Accuracy Nearest Neighbor Search via Intra-Query Parallelism

Nearest Neighbor Search (NNS) has recently drawn a rapid increase of interest due to its core role in managing high-dimensional vector data in data science and AI applications. The interest is fueled by the success of neural embedding,…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-01 Zhen Peng , Minjia Zhang , Kai Li , Ruoming Jin , Bin Ren

Near-Memory Computing: Past, Present, and Future

The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D…

Hardware Architecture · Computer Science 2019-08-08 Gagandeep Singh , Lorenzo Chelini , Stefano Corda , Ahsan Javed Awan , Sander Stuijk , Roel Jordans , Henk Corporaal , Albert-Jan Boonstra

SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures

Near-Data-Processing (NDP) architectures present a promising way to alleviate data movement costs and can provide significant performance and energy benefits to parallel applications. Typically, NDP architectures support several NDP units,…

Hardware Architecture · Computer Science 2021-02-16 Christina Giannoula , Nandita Vijaykumar , Nikela Papadopoulou , Vasileios Karakostas , Ivan Fernandez , Juan Gómez-Luna , Lois Orosa , Nectarios Koziris , Georgios Goumas , Onur Mutlu

NVR: Vector Runahead on NPUs for Sparse Memory Access

Deep Neural Networks are increasingly leveraging sparsity to reduce the scaling up of model parameter size. However, reducing wall-clock time through sparsity and pruning remains challenging due to irregular memory access patterns, leading…

Hardware Architecture · Computer Science 2025-03-19 Hui Wang , Zhengpeng Zhao , Jing Wang , Yushu Du , Yuan Cheng , Bing Guo , He Xiao , Chenhao Ma , Xiaomeng Han , Dean You , Jiapeng Guan , Ran Wei , Dawei Yang , Zhe Jiang

A Migratory Near Memory Processing Architecture Applied to Big Data Problems

Servers produced by mainstream vendors are inefficient in processing Big Data queries due to bottlenecks inherent in the fundamental architecture of these systems. Current server blades contain multicore processors connected to DRAM memory…

Databases · Computer Science 2020-03-23 Ed T. Upchurch

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-18 Fabian Schuiki , Michael Schaffner , Frank K. Gürkaynak , Luca Benini

Data-Driven Neuromorphic DRAM-based CNN and RNN Accelerators

The energy consumed by running large deep neural networks (DNNs) on hardware accelerators is dominated by the need for lots of fast memory to store both states and weights. This large required memory is currently only economically viable…

Computer Vision and Pattern Recognition · Computer Science 2020-03-31 Tobi Delbruck , Shih-Chii Liu

Resistive Neural Hardware Accelerators

Deep Neural Networks (DNNs), as a subset of Machine Learning (ML) techniques, entail that real-world data can be learned and that decisions can be made in real-time. However, their wide adoption is hindered by a number of software and…

Hardware Architecture · Computer Science 2021-09-10 Kamilya Smagulova , Mohammed E. Fouda , Fadi Kurdahi , Khaled Salama , Ahmed Eltawil

Demystifying Memory Access Patterns of FPGA-Based Graph Processing Accelerators

Recent advances in reprogrammable hardware (e.g., FPGAs) and memory technology (e.g., DDR4, HBM) promise to solve performance problems inherent to graph processing like irregular memory access patterns on traditional hardware (e.g., CPU).…

Hardware Architecture · Computer Science 2021-04-19 Jonas Dann , Daniel Ritter , Holger Fröning

Non-Relational Databases on FPGAs: Survey, Design Decisions, Challenges

Non-relational database systems (NRDS), such as graph, document, key-value, and wide-column, have gained much attention in various trending (business) application domains like smart logistics, social network analysis, and medical…

Databases · Computer Science 2020-07-16 Jonas Dann , Daniel Ritter , Holger Fröning

Proxics: an efficient programming model for far memory accelerators

The use of disaggregated or far memory systems such as CXL memory pools has renewed interest in Near-Data Processing (NDP): situating cores close to memory to reduce bandwidth requirements to and from the CPU. Hardware designs for such…

Operating Systems · Computer Science 2026-04-21 Zikai Liu , Niels Pressel , Jasmin Schult , Roman Meier , Pengcheng Xu , Timothy Roscoe

HYDRA: Hybrid Data Multiplexing and Run-time Layer Configurable DNN Accelerator

Deep neural networks (DNNs) offer plenty of challenges in executing efficient computation at edge nodes, primarily due to the huge hardware resource demands. The article proposes HYDRA, hybrid data multiplexing, and runtime layer…

Hardware Architecture · Computer Science 2026-03-31 Sonu Kumar , Komal Gupta , Gopal Raut , Mukul Lokhande , Santosh Kumar Vishvakarma

Rethinking Co-design of Neural Architectures and Hardware Accelerators

Neural architectures and hardware accelerators have been two driving forces for the progress in deep learning. Previous works typically attempt to optimize hardware given a fixed model architecture or model architecture given fixed…

Machine Learning · Computer Science 2021-02-18 Yanqi Zhou , Xuanyi Dong , Berkin Akin , Mingxing Tan , Daiyi Peng , Tianjian Meng , Amir Yazdanbakhsh , Da Huang , Ravi Narayanaswami , James Laudon

DRMap: A Generic DRAM Data Mapping Policy for Energy-Efficient Processing of Convolutional Neural Networks

Many convolutional neural network (CNN) accelerators face performance- and energy-efficiency challenges which are crucial for embedded implementations, due to high DRAM access latency and energy. Recently, some DRAM architectures have been…

Hardware Architecture · Computer Science 2023-03-06 Rachmad Vidya Wicaksana Putra , Muhammad Abdullah Hanif , Muhammad Shafique

Near-Data Processing for Differentiable Machine Learning Models

Near-data processing (NDP) refers to augmenting memory or storage with processing power. Despite its potential for acceleration computing and reducing power requirements, only limited progress has been made in popularizing NDP for various…

Hardware Architecture · Computer Science 2017-05-01 Hyeokjun Choe , Seil Lee , Hyunha Nam , Seongsik Park , Seijoon Kim , Eui-Young Chung , Sungroh Yoon