Related papers: Conduit: Programmer-Transparent Near-Data Processi…

Near-Data Processing for Differentiable Machine Learning Models

Near-data processing (NDP) refers to augmenting memory or storage with processing power. Despite its potential for acceleration computing and reducing power requirements, only limited progress has been made in popularizing NDP for various…

Hardware Architecture · Computer Science 2017-05-01 Hyeokjun Choe , Seil Lee , Hyunha Nam , Seongsik Park , Seijoon Kim , Eui-Young Chung , Sungroh Yoon

CODA: Enabling Co-location of Computation and Data for Near-Data Processing

Recent studies have demonstrated that near-data processing (NDP) is an effective technique for improving performance and energy efficiency of data-intensive workloads. However, leveraging NDP in realistic systems with multiple memory…

Hardware Architecture · Computer Science 2018-12-05 Hyojong Kim , Ramyad Hadidi , Lifeng Nai , Hyesoon Kim , Nuwan Jayasena , Yasuko Eckert , Onur Kayiran , Gabriel H. Loh

On-Disk Data Processing: Issues and Future Directions

In this paper, we present a survey of "on-disk" data processing (ODDP). ODDP, which is a form of near-data processing, refers to the computing arrangement where the secondary storage drives have the data processing capability. Proposed ODDP…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-11 Mayank Mishra , Arun K. Somani

Proxics: an efficient programming model for far memory accelerators

The use of disaggregated or far memory systems such as CXL memory pools has renewed interest in Near-Data Processing (NDP): situating cores close to memory to reduce bandwidth requirements to and from the CPU. Hardware designs for such…

Operating Systems · Computer Science 2026-04-21 Zikai Liu , Niels Pressel , Jasmin Schult , Roman Meier , Pengcheng Xu , Timothy Roscoe

A Survey of Near-Data Processing Architectures for Neural Networks

Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key…

Hardware Architecture · Computer Science 2021-12-24 Mehdi Hassanpour , Marc Riera , Antonio González

NearPM: A Near-Data Processing System for Storage-Class Applications

Persistent Memory (PM) technologies enable program recovery to a consistent state in a case of failure. To ensure this crash-consistent behavior, programs need to enforce persist ordering by employing mechanisms, such as logging and…

Computational Engineering, Finance, and Science · Computer Science 2023-04-03 Yasas Seneviratne , Korakit Seemakhupt , Sihang Liu , Samira Khan

Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders

Emerging Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL$.$mem protocol provides minimal latency overhead through an optimized protocol stack, frequent CXL memory…

Hardware Architecture · Computer Science 2024-10-07 Hyungkyu Ham , Jeongmin Hong , Geonwoo Park , Yunseon Shin , Okkyun Woo , Wonhyuk Yang , Jinhoon Bae , Eunhyeok Park , Hyojin Sung , Euicheol Lim , Gwangsun Kim

Revisiting Computational Storage for Data Integrity and Security

The idea of computational storage device (CSD) has come a long way since at least 1990s [1], [2]. By embedding computing resources within storage devices, CSDs could potentially offload computational tasks from CPUs and enable near-data…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-23 Chao Shi , Anthony Manschula , Tabassum Mahmud , Zeren Yang , Mai Zheng , Yong Chen , Jim Wayda , Matthew Wolf , Byungwoo Bang

RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

Neural personalized recommendation models are used across a wide variety of datacenter applications including search, social media, and entertainment. State-of-the-art models comprise large embedding tables that have billions of parameters…

Hardware Architecture · Computer Science 2021-02-02 Mark Wilkening , Udit Gupta , Samuel Hsia , Caroline Trippel , Carole-Jean Wu , David Brooks , Gu-Yeon Wei

SynCron: Efficient Synchronization Support for Near-Data-Processing Architectures

Near-Data-Processing (NDP) architectures present a promising way to alleviate data movement costs and can provide significant performance and energy benefits to parallel applications. Typically, NDP architectures support several NDP units,…

Hardware Architecture · Computer Science 2021-02-16 Christina Giannoula , Nandita Vijaykumar , Nikela Papadopoulou , Vasileios Karakostas , Ivan Fernandez , Juan Gómez-Luna , Lois Orosa , Nectarios Koziris , Georgios Goumas , Onur Mutlu

Conduit: A C++ Library for Best-effort High Performance Computing

Developing software to effectively take advantage of growth in parallel and distributed processing capacity poses significant challenges. Traditional programming techniques allow a user to assume that execution, message passing, and memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-24 Matthew Andres Moreno , Santiago Rodriguez Papa , Charles Ofria

Near-Policy: Accelerating On-Policy Distillation via Asynchronous Generation and Selective Packing

Standard knowledge distillation for autoregressive models often suffers from distribution mismatch. While on-policy methods mitigate this by leveraging student-generated outputs, they rely on computationally expensive Reinforcement Learning…

Machine Learning · Computer Science 2026-05-08 Miao Rang , Zhenni Bi , Hang Zhou , Kai Han , Xuechun Wang , An Xiao , Xinghao Chen , Yunhe Wang , Hanting Chen

Rethinking Compute Substrates for 3D-Stacked Near-Memory LLM Decoding: Microarchitecture-Scheduling Co-Design

Large language model (LLM) decoding is a major inference bottleneck because its low arithmetic intensity makes performance highly sensitive to memory bandwidth. 3D-stacked near-memory processing (NMP) provides substantially higher local…

Hardware Architecture · Computer Science 2026-04-10 Chenyang Ai , Yixing Zhang , Haoran Wu , Yudong Pan , Lechuan Zhao , Wenhui OU

NDFT: Accelerating Density Functional Theory Calculations via Hardware/Software Co-Design on Near-Data Computing System

Linear-response time-dependent Density Functional Theory (LR-TDDFT) is a widely used method for accurately predicting the excited-state properties of physical systems. Previous works have attempted to accelerate LR-TDDFT using heterogeneous…

Hardware Architecture · Computer Science 2025-04-07 Qingcai Jiang , Buxin Tu , Xiaoyu Hao , Junshi Chen , Hong An

Knowledge Distillation Framework for Accelerating High-Accuracy Neural Network-Based Molecular Dynamics Simulations

Neural network potentials (NNPs) offer a powerful alternative to traditional force fields for molecular dynamics (MD) simulations. Accurate and stable MD simulations, crucial for evaluating material properties, require training data…

Machine Learning · Computer Science 2025-06-23 Naoki Matsumura , Yuta Yoshimoto , Yuto Iwasaki , Meguru Yamazaki , Yasufumi Sakai

MPU: Towards Bandwidth-abundant SIMT Processor via Near-bank Computing

With the growing number of data-intensive workloads, GPU, which is the state-of-the-art single-instruction-multiple-thread (SIMT) processor, is hindered by the memory bandwidth wall. To alleviate this bottleneck, previously proposed…

Hardware Architecture · Computer Science 2021-03-12 Xinfeng Xie , Peng Gu , Yufei Ding , Dimin Niu , Hongzhong Zheng , Yuan Xie

Proteus: Enabling High-Performance Processing-Using-DRAM with Dynamic Bit-Precision, Adaptive Data Representation, and Flexible Arithmetic

Processing-using-DRAM (PUD) is a paradigm where the analog operational properties of DRAM are used to perform bulk logic operations. While PUD promises high throughput at low energy and area cost, we uncover three limitations of existing…

Hardware Architecture · Computer Science 2025-06-13 Geraldo F. Oliveira , Mayank Kabra , Yuxin Guo , Kangqi Chen , A. Giray Yağlıkçı , Melina Soysal , Mohammad Sadrosadati , Joaquin Olivares Bueno , Saugata Ghose , Juan Gómez-Luna , Onur Mutlu

NDPage: Efficient Address Translation for Near-Data Processing Architectures via Tailored Page Table

Near-Data Processing (NDP) has been a promising architectural paradigm to address the memory wall problem for data-intensive applications. Practical implementation of NDP architectures calls for system support for better programmability,…

Hardware Architecture · Computer Science 2025-02-21 Qingcai Jiang , Buxin Tu , Hong An

NMPO: Near-Memory Computing Profiling and Offloading

Real-world applications are now processing big-data sets, often bottlenecked by the data movement between the compute units and the main memory. Near-memory computing (NMC), a modern data-centric computational paradigm, can alleviate these…

Hardware Architecture · Computer Science 2021-06-30 Stefano Corda , Madhurya Kumaraswamy , Ahsan Javed Awan , Roel Jordans , Akash Kumar , Henk Corporaal

Small Scale Data-Free Knowledge Distillation

Data-free knowledge distillation is able to utilize the knowledge learned by a large teacher network to augment the training of a smaller student network without accessing the original training data, avoiding privacy, security, and…

Computer Vision and Pattern Recognition · Computer Science 2024-06-13 He Liu , Yikai Wang , Huaping Liu , Fuchun Sun , Anbang Yao