Related papers: Concurrent Processing Memory

Modified SIMD architecture suitable for single-chip implementation

We describe a modified SIMD architecture suitable for single-chip integration of a large number of processing elements, such as 1,000 or more. Important differences from traditional SIMD designs are: a) The size of the memory per processing…

Astrophysics · Physics 2007-05-23 Junichiro Makino

MIMDRAM: An End-to-End Processing-Using-DRAM System for High-Throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data Processing

Processing-using-DRAM (PUD) is a processing-in-memory (PIM) approach that uses a DRAM array's massive internal parallelism to execute very-wide data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM…

Hardware Architecture · Computer Science 2024-03-05 Geraldo F. Oliveira , Ataberk Olgun , Abdullah Giray Yağlıkçı , F. Nisa Bostancı , Juan Gómez-Luna , Saugata Ghose , Onur Mutlu

SIMDRAM: A Framework for Bit-Serial SIMD Processing Using DRAM

Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable the full adoption of processing-using-DRAM, it is necessary to provide support for more complex…

Hardware Architecture · Computer Science 2020-12-23 Nastaran Hajinazar , Geraldo F. Oliveira , Sven Gregorio , João Dinis Ferreira , Nika Mansouri Ghiasi , Minesh Patel , Mohammed Alser , Saugata Ghose , Juan Gómez-Luna , Onur Mutlu

SIMD Parallel MCMC Sampling with Applications for Big-Data Bayesian Analytics

Computational intensity and sequential nature of estimation techniques for Bayesian methods in statistics and machine learning, combined with their increasing applications for big data analytics, necessitate both the identification of…

Computation · Statistics 2015-03-02 Alireza S. Mahani , Mansour T. A. Sharabiani

High-Quality Iterative Logic Compiler for In-Memory SIMD Computation with Tight Coupling of Synthesis and Scheduling

In-memory computing (IMC) with single instruction multiple data (SIMD) setup enables memory to perform operations on the stored data in parallel to achieve high throughput and energy saving. To instruct a SIMD IMC hardware to compute a…

Emerging Technologies · Computer Science 2024-12-04 Xingyue Qian , Chenyang Lv , Zhezhi He , Weikang Qian

Search-in-Memory (SiM): Reliable, Versatile, and Efficient Data Matching in SSD's NAND Flash Memory Chip for Data Indexing Acceleration

To index the increasing volume of data, modern data indexes are typically stored on SSDs and cached in DRAM. However, searching such an index has resulted in significant I/O traffic due to limited access locality and inefficient cache…

Hardware Architecture · Computer Science 2024-08-05 Yun-Chih Chen , Yuan-Hao Chang , Tei-Wei Kuo

SIMDRAM: An End-to-End Framework for Bit-Serial SIMD Computing in DRAM

Processing-using-DRAM has been proposed for a limited set of basic operations (i.e., logic operations, addition). However, in order to enable full adoption of processing-using-DRAM, it is necessary to provide support for more complex…

Hardware Architecture · Computer Science 2021-07-01 Nastaran Hajinazar , Geraldo F. Oliveira , Sven Gregorio , João Ferreira , Nika Mansouri Ghiasi , Minesh Patel , Mohammed Alser , Saugata Ghose , Juan Gómez Luna , Onur Mutlu

Learning to Remember More with Less Memorization

Memory-augmented neural networks consisting of a neural controller and an external memory have shown potentials in long-term sequential learning. Current RAM-like memory models maintain memory accessing every timesteps, thus they do not…

Machine Learning · Computer Science 2019-03-21 Hung Le , Truyen Tran , Svetha Venkatesh

Shared-PIM: Enabling Concurrent Computation and Data Flow for Faster Processing-in-DRAM

Processing-in-Memory (PIM) enhances memory with computational capabilities, potentially solving energy and latency issues associated with data transfer between memory and processors. However, managing concurrent computation and data flow…

Hardware Architecture · Computer Science 2025-05-09 Ahmed Mamdouh , Haoran Geng , Michael Niemier , Xiaobo Sharon Hu , Dayane Reis

MASIM: An Efficient Multi-Array Scheduler for In-Memory SIMD Computation

Single instruction, multiple data (SIMD) is a popular design style of in-memory computing (IMC) architectures, which enables memory arrays to perform logic operations to achieve low energy consumption and high parallelism. To implement a…

Emerging Technologies · Computer Science 2024-12-04 Xingyue Qian , Chen Nie , Zhezhi He , Weikang Qian

Synthesis of signal processing algorithms with constraints on minimal parallelism and memory space

This thesis develops signal-processing algorithms and implementation schemes under constraints of minimal parallelism and memory space, with the goal of improving energy efficiency of low-power computing hardware. We propose (i) a…

Signal Processing · Electrical Eng. & Systems 2025-12-30 Sergey Salishev

Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware

Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A…

Hardware Architecture · Computer Science 2023-04-04 Juan Gómez-Luna , Izzat El Hajj , Ivan Fernandez , Christina Giannoula , Geraldo F. Oliveira , Onur Mutlu

A Modern Primer on Processing in Memory

This paper discusses recent research that aims to enable computation close to data, an approach we broadly call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside memory chips or…

Hardware Architecture · Computer Science 2025-02-07 Onur Mutlu , Saugata Ghose , Juan Gómez-Luna , Rachata Ausavarungnirun , Mohammad Sadrosadati , Geraldo F. Oliveira

Streaming Approximation Scheme for Minimizing Total Completion Time on Parallel Machines Subject to Varying Processing Capacity

We study the problem of minimizing total completion time on parallel machines subject to varying processing capacity. In this paper, we develop an approximation scheme for the problem under the data stream model where the input data is…

Data Structures and Algorithms · Computer Science 2022-04-06 Bin Fu , Yumei Huo , Hairong Zhao

PSCNN: A 885.86 TOPS/W Programmable SRAM-based Computing-In-Memory Processor for Keyword Spotting

Computing-in-memory (CIM) has attracted significant attentions in recent years due to its massive parallelism and low power consumption. However, current CIM designs suffer from large area overhead of small CIM macros and bad programmablity…

Hardware Architecture · Computer Science 2022-05-04 Shu-Hung Kuo , Tian-Sheuan Chang

Parallel scheduling of task trees with limited memory

This paper investigates the execution of tree-shaped task graphs using multiple processors. Each edge of such a tree represents some large data. A task can only be executed if all input and output data fit into memory, and a data can only…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-02 Lionel Eyraud-Dubois , Loris Marchal , Oliver Sinnen , Frédéric Vivien

FAST: A Fully-Concurrent Access Technique to All SRAM Rows for Enhanced Speed and Energy Efficiency in Data-Intensive Applications

Compute-in-memory (CiM) is a promising approach to improving the computing speed and energy efficiency in dataintensive applications. Beyond existing CiM techniques of bitwise logic-in-memory operations and dot product operations, this…

Hardware Architecture · Computer Science 2023-01-03 Yiming Chen , Yushen Fu , Mingyen Lee , Sumitha George , Yongpan Liu , Vijaykrishnan Narayanan , Huazhong Yang , Xueqing Li

Memory-Centric Computing

Memory-centric computing aims to enable computation capability in and near all places where data is generated and stored. As such, it can greatly reduce the large negative performance and energy impact of data access and data movement, by…

Hardware Architecture · Computer Science 2023-09-15 Onur Mutlu

The Future of Memory: Limits and Opportunities

Memory latency, bandwidth, capacity, and energy increasingly limit performance. In this paper, we reconsider proposed system architectures that consist of huge (many-terabyte to petabyte scale) memories shared among large numbers of CPUs.…

Hardware Architecture · Computer Science 2025-09-24 Samuel Dayo , Shuhan Liu , Peijing Li , Philip Levis , Subhasish Mitra , Thierry Tambe , David Tennenhouse , H. -S. Philip Wong

Systems for Memory Disaggregation: Challenges & Opportunities

Memory disaggregation addresses memory imbalance in a cluster by decoupling CPU and memory allocations of applications while also increasing the effective memory capacity for (memory-intensive) applications beyond the local memory limit…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-07 Anil Yelam