Related papers: Near-Data Processing for Differentiable Machine Le…

A Survey of Near-Data Processing Architectures for Neural Networks

Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key…

Hardware Architecture · Computer Science 2021-12-24 Mehdi Hassanpour , Marc Riera , Antonio González

Conduit: Programmer-Transparent Near-Data Processing Using Multiple Compute-Capable Resources in Solid State Drives

Solid-state drives (SSDs) are well suited for near-data processing (NDP) because they: (1) store large application datasets, and (2) support three NDP paradigms: in-storage processing (ISP), processing using DRAM in the SSD (PuD-SSD), and…

Hardware Architecture · Computer Science 2026-01-27 Rakesh Nadig , Vamanan Arulchelvan , Mayank Kabra , Harshita Gupta , Rahul Bera , Nika Mansouri Ghiasi , Nanditha Rao , Qingcai Jiang , Andreas Kosmas Kakolyris , Yu Liang , Mohammad Sadrosadati , Onur Mutlu

CODA: Enabling Co-location of Computation and Data for Near-Data Processing

Recent studies have demonstrated that near-data processing (NDP) is an effective technique for improving performance and energy efficiency of data-intensive workloads. However, leveraging NDP in realistic systems with multiple memory…

Hardware Architecture · Computer Science 2018-12-05 Hyojong Kim , Ramyad Hadidi , Lifeng Nai , Hyesoon Kim , Nuwan Jayasena , Yasuko Eckert , Onur Kayiran , Gabriel H. Loh

NearPM: A Near-Data Processing System for Storage-Class Applications

Persistent Memory (PM) technologies enable program recovery to a consistent state in a case of failure. To ensure this crash-consistent behavior, programs need to enforce persist ordering by employing mechanisms, such as logging and…

Computational Engineering, Finance, and Science · Computer Science 2023-04-03 Yasas Seneviratne , Korakit Seemakhupt , Sihang Liu , Samira Khan

Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture

Multi Scale Deformable Attention (MSDAttn) has become a fundamental component in various vision tasks due to its effective multi scale grid sampling (MSGS). However, its reliance on random sampling results in highly irregular memory access…

Hardware Architecture · Computer Science 2026-03-03 Huize Li , Qinggang Wang , Bing Gao , Dan Chen , Yu Huang , Xin Xin

Proxics: an efficient programming model for far memory accelerators

The use of disaggregated or far memory systems such as CXL memory pools has renewed interest in Near-Data Processing (NDP): situating cores close to memory to reduce bandwidth requirements to and from the CPU. Hardware designs for such…

Operating Systems · Computer Science 2026-04-21 Zikai Liu , Niels Pressel , Jasmin Schult , Roman Meier , Pengcheng Xu , Timothy Roscoe

The Nearest-Neighbor Derivative Process: Modeling Spatial Rates of Change in Massive Datasets

Gaussian processes (GPs) are instrumental in modeling spatial processes, offering precise interpolation and prediction capabilities across fields such as environmental science and biology. Recently, there has been growing interest in…

Methodology · Statistics 2025-09-04 Jiawen Chen , Aritra Halder , Yun Li , Sudipto Banerjee , Didong Li

On-Disk Data Processing: Issues and Future Directions

In this paper, we present a survey of "on-disk" data processing (ODDP). ODDP, which is a form of near-data processing, refers to the computing arrangement where the secondary storage drives have the data processing capability. Proposed ODDP…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-11 Mayank Mishra , Arun K. Somani

Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders

Emerging Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL$.$mem protocol provides minimal latency overhead through an optimized protocol stack, frequent CXL memory…

Hardware Architecture · Computer Science 2024-10-07 Hyungkyu Ham , Jeongmin Hong , Geonwoo Park , Yunseon Shin , Okkyun Woo , Wonhyuk Yang , Jinhoon Bae , Eunhyeok Park , Hyojin Sung , Euicheol Lim , Gwangsun Kim

Moving Processing to Data: On the Influence of Processing in Memory on Data Management

Near-Data Processing refers to an architectural hardware and software paradigm, based on the co-location of storage and compute units. Ideally, it will allow to execute application-defined data- or compute-intensive operations in-situ, i.e.…

Databases · Computer Science 2019-05-14 Tobias Vincon , Andreas Koch , Ilia Petrov

PIM-Opt: Demystifying Distributed Optimization Algorithms on a Real-World Processing-In-Memory System

Modern Machine Learning (ML) training on large-scale datasets is a very time-consuming workload. It relies on the optimization algorithm Stochastic Gradient Descent (SGD) due to its effectiveness, simplicity, and generalization performance.…

Hardware Architecture · Computer Science 2024-09-30 Steve Rhyner , Haocong Luo , Juan Gómez-Luna , Mohammad Sadrosadati , Jiawei Jiang , Ataberk Olgun , Harshita Gupta , Ce Zhang , Onur Mutlu

Near Data Acceleration with Concurrent Host Access

Near-data accelerators (NDAs) that are integrated with main memory have the potential for significant power and performance benefits. Fully realizing these benefits requires the large available memory capacity to be shared between the host…

Hardware Architecture · Computer Science 2020-12-02 Benjamin Y. Cho , Yongkee Kwon , Sangkug Lym , Mattan Erez

DP-NMT: Scalable Differentially-Private Machine Translation

Neural machine translation (NMT) is a widely popular text generation task, yet there is a considerable research gap in the development of privacy-preserving NMT models, despite significant data privacy concerns for NMT systems.…

Computation and Language · Computer Science 2024-04-25 Timour Igamberdiev , Doan Nam Long Vu , Felix Künnecke , Zhuo Yu , Jannik Holmer , Ivan Habernal

Model Parallelism With Subnetwork Data Parallelism

Pre-training large neural networks at scale imposes heavy memory demands on accelerators and often requires costly communication. We introduce Subnetwork Data Parallelism (SDP), a distributed training framework that partitions a model into…

Machine Learning · Computer Science 2025-10-06 Vaibhav Singh , Zafir Khalid , Edouard Oyallon , Eugene Belilovsky

RQP-SGD: Differential Private Machine Learning through Noisy SGD and Randomized Quantization

The rise of IoT devices has prompted the demand for deploying machine learning at-the-edge with real-time, efficient, and secure data processing. In this context, implementing machine learning (ML) models with real-valued weight parameters…

Machine Learning · Computer Science 2024-02-12 Ce Feng , Parv Venkitasubramaniam

Scaling up Stochastic Gradient Descent for Non-convex Optimisation

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

NNP/MM: Accelerating molecular dynamics simulations with machine learning potentials and molecular mechanic

Machine learning potentials have emerged as a means to enhance the accuracy of biomolecular simulations. However, their application is constrained by the significant computational cost arising from the vast number of parameters compared to…

Biomolecules · Quantitative Biology 2023-08-29 Raimondas Galvelis , Alejandro Varela-Rial , Stefan Doerr , Roberto Fino , Peter Eastman , Thomas E. Markland , John D. Chodera , Gianni De Fabritiis

DreamDDP: Accelerating Data Parallel Distributed LLM Training with Layer-wise Scheduled Partial Synchronization

The growth of large language models (LLMs) increases challenges of accelerating distributed training across multiple GPUs in different data centers. Moreover, concerns about data privacy and data exhaustion have heightened interest in…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-18 Zhenheng Tang , Zichen Tang , Junlin Huang , Xinglin Pan , Rudan Yan , Yuxin Wang , Amelie Chi Zhou , Shaohuai Shi , Xiaowen Chu , Bo Li

Near-Policy: Accelerating On-Policy Distillation via Asynchronous Generation and Selective Packing

Standard knowledge distillation for autoregressive models often suffers from distribution mismatch. While on-policy methods mitigate this by leveraging student-generated outputs, they rely on computationally expensive Reinforcement Learning…

Machine Learning · Computer Science 2026-05-08 Miao Rang , Zhenni Bi , Hang Zhou , Kai Han , Xuechun Wang , An Xiao , Xinghao Chen , Yunhe Wang , Hanting Chen

RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference

Neural personalized recommendation models are used across a wide variety of datacenter applications including search, social media, and entertainment. State-of-the-art models comprise large embedding tables that have billions of parameters…

Hardware Architecture · Computer Science 2021-02-02 Mark Wilkening , Udit Gupta , Samuel Hsia , Caroline Trippel , Carole-Jean Wu , David Brooks , Gu-Yeon Wei