English
Related papers

Related papers: Conduit: Programmer-Transparent Near-Data Processi…

200 papers

Near-data processing (NDP) refers to augmenting memory or storage with processing power. Despite its potential for acceleration computing and reducing power requirements, only limited progress has been made in popularizing NDP for various…

Hardware Architecture · Computer Science 2017-05-01 Hyeokjun Choe , Seil Lee , Hyunha Nam , Seongsik Park , Seijoon Kim , Eui-Young Chung , Sungroh Yoon

Recent studies have demonstrated that near-data processing (NDP) is an effective technique for improving performance and energy efficiency of data-intensive workloads. However, leveraging NDP in realistic systems with multiple memory…

Hardware Architecture · Computer Science 2018-12-05 Hyojong Kim , Ramyad Hadidi , Lifeng Nai , Hyesoon Kim , Nuwan Jayasena , Yasuko Eckert , Onur Kayiran , Gabriel H. Loh

In this paper, we present a survey of "on-disk" data processing (ODDP). ODDP, which is a form of near-data processing, refers to the computing arrangement where the secondary storage drives have the data processing capability. Proposed ODDP…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-11 Mayank Mishra , Arun K. Somani

The use of disaggregated or far memory systems such as CXL memory pools has renewed interest in Near-Data Processing (NDP): situating cores close to memory to reduce bandwidth requirements to and from the CPU. Hardware designs for such…

Operating Systems · Computer Science 2026-04-21 Zikai Liu , Niels Pressel , Jasmin Schult , Roman Meier , Pengcheng Xu , Timothy Roscoe

Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key…

Hardware Architecture · Computer Science 2021-12-24 Mehdi Hassanpour , Marc Riera , Antonio González

Persistent Memory (PM) technologies enable program recovery to a consistent state in a case of failure. To ensure this crash-consistent behavior, programs need to enforce persist ordering by employing mechanisms, such as logging and…

Computational Engineering, Finance, and Science · Computer Science 2023-04-03 Yasas Seneviratne , Korakit Seemakhupt , Sihang Liu , Samira Khan

Emerging Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL$.$mem protocol provides minimal latency overhead through an optimized protocol stack, frequent CXL memory…

The idea of computational storage device (CSD) has come a long way since at least 1990s [1], [2]. By embedding computing resources within storage devices, CSDs could potentially offload computational tasks from CPUs and enable near-data…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-23 Chao Shi , Anthony Manschula , Tabassum Mahmud , Zeren Yang , Mai Zheng , Yong Chen , Jim Wayda , Matthew Wolf , Byungwoo Bang

Neural personalized recommendation models are used across a wide variety of datacenter applications including search, social media, and entertainment. State-of-the-art models comprise large embedding tables that have billions of parameters…

Hardware Architecture · Computer Science 2021-02-02 Mark Wilkening , Udit Gupta , Samuel Hsia , Caroline Trippel , Carole-Jean Wu , David Brooks , Gu-Yeon Wei

Near-Data-Processing (NDP) architectures present a promising way to alleviate data movement costs and can provide significant performance and energy benefits to parallel applications. Typically, NDP architectures support several NDP units,…

Developing software to effectively take advantage of growth in parallel and distributed processing capacity poses significant challenges. Traditional programming techniques allow a user to assume that execution, message passing, and memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-24 Matthew Andres Moreno , Santiago Rodriguez Papa , Charles Ofria

Standard knowledge distillation for autoregressive models often suffers from distribution mismatch. While on-policy methods mitigate this by leveraging student-generated outputs, they rely on computationally expensive Reinforcement Learning…

Machine Learning · Computer Science 2026-05-08 Miao Rang , Zhenni Bi , Hang Zhou , Kai Han , Xuechun Wang , An Xiao , Xinghao Chen , Yunhe Wang , Hanting Chen

Large language model (LLM) decoding is a major inference bottleneck because its low arithmetic intensity makes performance highly sensitive to memory bandwidth. 3D-stacked near-memory processing (NMP) provides substantially higher local…

Hardware Architecture · Computer Science 2026-04-10 Chenyang Ai , Yixing Zhang , Haoran Wu , Yudong Pan , Lechuan Zhao , Wenhui OU

Linear-response time-dependent Density Functional Theory (LR-TDDFT) is a widely used method for accurately predicting the excited-state properties of physical systems. Previous works have attempted to accelerate LR-TDDFT using heterogeneous…

Hardware Architecture · Computer Science 2025-04-07 Qingcai Jiang , Buxin Tu , Xiaoyu Hao , Junshi Chen , Hong An

Neural network potentials (NNPs) offer a powerful alternative to traditional force fields for molecular dynamics (MD) simulations. Accurate and stable MD simulations, crucial for evaluating material properties, require training data…

Machine Learning · Computer Science 2025-06-23 Naoki Matsumura , Yuta Yoshimoto , Yuto Iwasaki , Meguru Yamazaki , Yasufumi Sakai

With the growing number of data-intensive workloads, GPU, which is the state-of-the-art single-instruction-multiple-thread (SIMT) processor, is hindered by the memory bandwidth wall. To alleviate this bottleneck, previously proposed…

Hardware Architecture · Computer Science 2021-03-12 Xinfeng Xie , Peng Gu , Yufei Ding , Dimin Niu , Hongzhong Zheng , Yuan Xie

Processing-using-DRAM (PUD) is a paradigm where the analog operational properties of DRAM are used to perform bulk logic operations. While PUD promises high throughput at low energy and area cost, we uncover three limitations of existing…

Near-Data Processing (NDP) has been a promising architectural paradigm to address the memory wall problem for data-intensive applications. Practical implementation of NDP architectures calls for system support for better programmability,…

Hardware Architecture · Computer Science 2025-02-21 Qingcai Jiang , Buxin Tu , Hong An

Real-world applications are now processing big-data sets, often bottlenecked by the data movement between the compute units and the main memory. Near-memory computing (NMC), a modern data-centric computational paradigm, can alleviate these…

Hardware Architecture · Computer Science 2021-06-30 Stefano Corda , Madhurya Kumaraswamy , Ahsan Javed Awan , Roel Jordans , Akash Kumar , Henk Corporaal

Data-free knowledge distillation is able to utilize the knowledge learned by a large teacher network to augment the training of a smaller student network without accessing the original training data, avoiding privacy, security, and…

Computer Vision and Pattern Recognition · Computer Science 2024-06-13 He Liu , Yikai Wang , Huaping Liu , Fuchun Sun , Anbang Yao
‹ Prev 1 2 3 10 Next ›