Related papers: A Migratory Near Memory Processing Architecture Ap…

Moving Processing to Data: On the Influence of Processing in Memory on Data Management

Near-Data Processing refers to an architectural hardware and software paradigm, based on the co-location of storage and compute units. Ideally, it will allow to execute application-defined data- or compute-intensive operations in-situ, i.e.…

Databases · Computer Science 2019-05-14 Tobias Vincon , Andreas Koch , Ilia Petrov

Memory-Based Multi-Processing Method For Big Data Computation

The evolution of the Internet and computer applications have generated colossal amount of data. They are referred to as Big Data and they consist of huge volume, high velocity, and variable datasets that need to be managed at the right…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-13 Youssef Bassil

Near-Memory Computing: Past, Present, and Future

The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D…

Hardware Architecture · Computer Science 2019-08-08 Gagandeep Singh , Lorenzo Chelini , Stefano Corda , Ahsan Javed Awan , Sander Stuijk , Roel Jordans , Henk Corporaal , Albert-Jan Boonstra

Processing Data Where It Makes Sense: Enabling In-Memory Computation

Today's systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from memory…

Hardware Architecture · Computer Science 2019-03-12 Onur Mutlu , Saugata Ghose , Juan Gómez-Luna , Rachata Ausavarungnirun

A Survey of Near-Data Processing Architectures for Neural Networks

Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key…

Hardware Architecture · Computer Science 2021-12-24 Mehdi Hassanpour , Marc Riera , Antonio González

Near Data Acceleration with Concurrent Host Access

Near-data accelerators (NDAs) that are integrated with main memory have the potential for significant power and performance benefits. Fully realizing these benefits requires the large available memory capacity to be shared between the host…

Hardware Architecture · Computer Science 2020-12-02 Benjamin Y. Cho , Yongkee Kwon , Sangkug Lym , Mattan Erez

Heterogeneous Data-Centric Architectures for Modern Data-Intensive Applications: Case Studies in Machine Learning and Databases

Today's computing systems require moving data back-and-forth between computing resources (e.g., CPUs, GPUs, accelerators) and off-chip main memory so that computation can take place on the data. Unfortunately, this data movement is a major…

Hardware Architecture · Computer Science 2022-05-31 Geraldo F. Oliveira , Amirali Boroumand , Saugata Ghose , Juan Gómez-Luna , Onur Mutlu

Memory-Centric Computing

Memory-centric computing aims to enable computation capability in and near all places where data is generated and stored. As such, it can greatly reduce the large negative performance and energy impact of data access and data movement, by…

Hardware Architecture · Computer Science 2023-09-15 Onur Mutlu

Big Memory Servers and Modern Approaches to Disk-Based Computation

The Big Memory solution is a new computing paradigm facilitated by commodity server platforms that are available today. It exposes a large RAM subsystem to the Operating System and therefore affords application programmers a number of…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-26 Po Hao Chen , Kurt Keville

Memory-Centric Computing: Recent Advances in Processing-in-DRAM

Memory-centric computing aims to enable computation capability in and near all places where data is generated and stored. As such, it can greatly reduce the large negative performance and energy impact of data access and data movement, by…

Hardware Architecture · Computer Science 2024-12-30 Onur Mutlu , Ataberk Olgun , Geraldo F. Oliveira , Ismail Emir Yuksel

Accelerator-driven Data Arrangement to Minimize Transformers Run-time on Multi-core Architectures

The increasing complexity of transformer models in artificial intelligence expands their computational costs, memory usage, and energy consumption. Hardware acceleration tackles the ensuing challenges by designing processors and…

Hardware Architecture · Computer Science 2023-12-21 Alireza Amirshahi , Giovanni Ansaloni , David Atienza

Enabling the Adoption of Processing-in-Memory: Challenges, Mechanisms, Future Research Directions

Poor DRAM technology scaling over the course of many years has caused DRAM-based main memory to increasingly become a larger system bottleneck. A major reason for the bottleneck is that data stored within DRAM must be moved across a…

Hardware Architecture · Computer Science 2018-02-02 Saugata Ghose , Kevin Hsieh , Amirali Boroumand , Rachata Ausavarungnirun , Onur Mutlu

A Survey of Resource Management for Processing-in-Memory and Near-Memory Processing Architectures

Due to amount of data involved in emerging deep learning and big data applications, operations related to data movement have quickly become the bottleneck. Data-centric computing (DCC), as enabled by processing-in-memory (PIM) and…

Hardware Architecture · Computer Science 2020-09-22 Kamil Khan , Sudeep Pasricha , Ryan Gary Kim

MigrantStore: Leveraging Virtual Memory in DRAM-PCM Memory Architecture

With the imminent slowing down of DRAM scaling, Phase Change Memory (PCM) is emerging as a lead alternative for main memory technology. While PCM achieves low energy due to various technology-specific advantages, PCM is significantly slower…

Hardware Architecture · Computer Science 2015-04-17 Hamza Bin Sohail , Balajee Vamanan , T. N. Vijaykumar

Processing Database Joins over a Shared-Nothing System of Multicore Machines

To process a large volume of data, modern data management systems use a collection of machines connected through a network. This paper looks into the feasibility of scaling up such a shared-nothing system while processing a compute- and…

Databases · Computer Science 2018-04-26 Abhirup Chakraborty

Data-Centric and Data-Aware Frameworks for Fundamentally Efficient Data Handling in Modern Computing Systems

There is an explosive growth in the size of the input and/or intermediate data used and generated by modern and emerging applications. Unfortunately, modern computing systems are not capable of handling large amounts of data efficiently.…

Hardware Architecture · Computer Science 2021-09-14 Nastaran Hajinazar

Near Linear OS Scheduling Optimization for Memory Intensive Workloads on Multi-socket Multi-core servers

Multi-socket multi-core servers are used for solving some of the important problems in computing. Remote DRAM accesses can impact performance of certain applications running on such servers. This paper presents a new near linear operating…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-07 Suryanarayana Murthy Durbhakula

Memory Aware Load Balance Strategy on a Parallel Branch-and-Bound Application

The latest trends in high-performance computing systems show an increasing demand on the use of a large scale multicore systems in a efficient way, so that high compute-intensive applications can be executed reasonably well. However, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-25 Juliana M. N. Silva , Cristina Boeres , Lúcia M. A. Drummond , Artur A. Pessoa

Emulating a large memory with a collection of small ones

Sequential computation is well understood but does not scale well with current technology. Within the next decade, systems will contain large numbers of processors with potentially thousands of processors per chip. Despite this, many…

Hardware Architecture · Computer Science 2015-11-17 James Hanlon

Dalorex: A Data-Local Program Execution and Architecture for Memory-bound Applications

Applications with low data reuse and frequent irregular memory accesses, such as graph or sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core utilization. While prior work with prefetching,…

Hardware Architecture · Computer Science 2023-05-05 Marcelo Orenes-Vera , Esin Tureci , David Wentzlaff , Margaret Martonosi