English
Related papers

Related papers: Near-Data Processing for Differentiable Machine Le…

200 papers

Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key…

Hardware Architecture · Computer Science 2021-12-24 Mehdi Hassanpour , Marc Riera , Antonio González

Solid-state drives (SSDs) are well suited for near-data processing (NDP) because they: (1) store large application datasets, and (2) support three NDP paradigms: in-storage processing (ISP), processing using DRAM in the SSD (PuD-SSD), and…

Recent studies have demonstrated that near-data processing (NDP) is an effective technique for improving performance and energy efficiency of data-intensive workloads. However, leveraging NDP in realistic systems with multiple memory…

Hardware Architecture · Computer Science 2018-12-05 Hyojong Kim , Ramyad Hadidi , Lifeng Nai , Hyesoon Kim , Nuwan Jayasena , Yasuko Eckert , Onur Kayiran , Gabriel H. Loh

Persistent Memory (PM) technologies enable program recovery to a consistent state in a case of failure. To ensure this crash-consistent behavior, programs need to enforce persist ordering by employing mechanisms, such as logging and…

Computational Engineering, Finance, and Science · Computer Science 2023-04-03 Yasas Seneviratne , Korakit Seemakhupt , Sihang Liu , Samira Khan

Multi Scale Deformable Attention (MSDAttn) has become a fundamental component in various vision tasks due to its effective multi scale grid sampling (MSGS). However, its reliance on random sampling results in highly irregular memory access…

Hardware Architecture · Computer Science 2026-03-03 Huize Li , Qinggang Wang , Bing Gao , Dan Chen , Yu Huang , Xin Xin

The use of disaggregated or far memory systems such as CXL memory pools has renewed interest in Near-Data Processing (NDP): situating cores close to memory to reduce bandwidth requirements to and from the CPU. Hardware designs for such…

Operating Systems · Computer Science 2026-04-21 Zikai Liu , Niels Pressel , Jasmin Schult , Roman Meier , Pengcheng Xu , Timothy Roscoe

Gaussian processes (GPs) are instrumental in modeling spatial processes, offering precise interpolation and prediction capabilities across fields such as environmental science and biology. Recently, there has been growing interest in…

Methodology · Statistics 2025-09-04 Jiawen Chen , Aritra Halder , Yun Li , Sudipto Banerjee , Didong Li

In this paper, we present a survey of "on-disk" data processing (ODDP). ODDP, which is a form of near-data processing, refers to the computing arrangement where the secondary storage drives have the data processing capability. Proposed ODDP…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-09-11 Mayank Mishra , Arun K. Somani

Emerging Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL$.$mem protocol provides minimal latency overhead through an optimized protocol stack, frequent CXL memory…

Near-Data Processing refers to an architectural hardware and software paradigm, based on the co-location of storage and compute units. Ideally, it will allow to execute application-defined data- or compute-intensive operations in-situ, i.e.…

Databases · Computer Science 2019-05-14 Tobias Vincon , Andreas Koch , Ilia Petrov

Modern Machine Learning (ML) training on large-scale datasets is a very time-consuming workload. It relies on the optimization algorithm Stochastic Gradient Descent (SGD) due to its effectiveness, simplicity, and generalization performance.…

Hardware Architecture · Computer Science 2024-09-30 Steve Rhyner , Haocong Luo , Juan Gómez-Luna , Mohammad Sadrosadati , Jiawei Jiang , Ataberk Olgun , Harshita Gupta , Ce Zhang , Onur Mutlu

Near-data accelerators (NDAs) that are integrated with main memory have the potential for significant power and performance benefits. Fully realizing these benefits requires the large available memory capacity to be shared between the host…

Hardware Architecture · Computer Science 2020-12-02 Benjamin Y. Cho , Yongkee Kwon , Sangkug Lym , Mattan Erez

Neural machine translation (NMT) is a widely popular text generation task, yet there is a considerable research gap in the development of privacy-preserving NMT models, despite significant data privacy concerns for NMT systems.…

Computation and Language · Computer Science 2024-04-25 Timour Igamberdiev , Doan Nam Long Vu , Felix Künnecke , Zhuo Yu , Jannik Holmer , Ivan Habernal

Pre-training large neural networks at scale imposes heavy memory demands on accelerators and often requires costly communication. We introduce Subnetwork Data Parallelism (SDP), a distributed training framework that partitions a model into…

Machine Learning · Computer Science 2025-10-06 Vaibhav Singh , Zafir Khalid , Edouard Oyallon , Eugene Belilovsky

The rise of IoT devices has prompted the demand for deploying machine learning at-the-edge with real-time, efficient, and secure data processing. In this context, implementing machine learning (ML) models with real-valued weight parameters…

Machine Learning · Computer Science 2024-02-12 Ce Feng , Parv Venkitasubramaniam

Stochastic gradient descent (SGD) is a widely adopted iterative method for optimizing differentiable objective functions. In this paper, we propose and discuss a novel approach to scale up SGD in applications involving non-convex functions…

Machine Learning · Statistics 2022-10-07 Saad Mohamad , Hamad Alamri , Abdelhamid Bouchachia

Machine learning potentials have emerged as a means to enhance the accuracy of biomolecular simulations. However, their application is constrained by the significant computational cost arising from the vast number of parameters compared to…

The growth of large language models (LLMs) increases challenges of accelerating distributed training across multiple GPUs in different data centers. Moreover, concerns about data privacy and data exhaustion have heightened interest in…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-18 Zhenheng Tang , Zichen Tang , Junlin Huang , Xinglin Pan , Rudan Yan , Yuxin Wang , Amelie Chi Zhou , Shaohuai Shi , Xiaowen Chu , Bo Li

Standard knowledge distillation for autoregressive models often suffers from distribution mismatch. While on-policy methods mitigate this by leveraging student-generated outputs, they rely on computationally expensive Reinforcement Learning…

Machine Learning · Computer Science 2026-05-08 Miao Rang , Zhenni Bi , Hang Zhou , Kai Han , Xuechun Wang , An Xiao , Xinghao Chen , Yunhe Wang , Hanting Chen

Neural personalized recommendation models are used across a wide variety of datacenter applications including search, social media, and entertainment. State-of-the-art models comprise large embedding tables that have billions of parameters…

Hardware Architecture · Computer Science 2021-02-02 Mark Wilkening , Udit Gupta , Samuel Hsia , Caroline Trippel , Carole-Jean Wu , David Brooks , Gu-Yeon Wei
‹ Prev 1 2 3 10 Next ›