Related papers: Programmable FPGA-based Memory Controller

The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface

Supported by their high power efficiency and recent advancements in High Level Synthesis (HLS), FPGAs are quickly finding their way into HPC and cloud systems. Large amounts of work have been done so far on loop and area optimizations for…

Hardware Architecture · Computer Science 2020-02-17 Hamid Reza Zohouri , Satoshi Matsuoka

Exploring Memory Access Patterns for Graph Processing Accelerators

Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph…

Databases · Computer Science 2021-02-09 Jonas Dann , Daniel Ritter , Holger Fröning

A Flexible High-Bandwidth Low-Latency Multi-Port Memory Controller

Multi-port memory controllers (MPMCs) have become increasingly important in many modern applications due to the tremendous growth in bandwidth requirement. Many approaches so far have focused on improving either the memory access latency or…

Hardware Architecture · Computer Science 2018-06-12 Xuan-Thuan Nguyen , Duc-Hung Le , Trong-Tu Bui , Huu-Thuan Huynh , Cong-Kha Pham

A Benchmarking Platform for DDR4 Memory Performance in Data-Center-Class FPGAs

FPGAs are increasingly utilized in data centers due to their capacity to exploit data parallelism in computationally intensive workloads. Furthermore, the processing of modern data center workloads requires moving vast amounts of data,…

Hardware Architecture · Computer Science 2025-07-02 Andrea Galimberti , Gabriele Montanaro , Andrea Motta , Federico Proverbio , Davide Zoni

An Irredundant and Compressed Data Layout to Optimize Bandwidth Utilization of FPGA Accelerators

Memory bandwidth is known to be a performance bottleneck for FPGA accelerators, especially when they deal with large multi-dimensional data-sets. A large body of work focuses on reducing of off-chip transfers, but few authors try to improve…

Hardware Architecture · Computer Science 2024-01-23 Corentin Ferry , Nicolas Derumigny , Steven Derrien , Sanjay Rajopadhye

Increasing FPGA Accelerators Memory Bandwidth with a Burst-Friendly Memory Layout

Offloading compute-intensive kernels to hardware accelerators relies on the large degree of parallelism offered by these platforms. However, the effective bandwidth of the memory interface often causes a bottleneck, hindering the…

Hardware Architecture · Computer Science 2022-02-25 Corentin Ferry , Tomofumi Yuki , Steven Derrien , Sanjay Rajopadhye

Demystifying Memory Access Patterns of FPGA-Based Graph Processing Accelerators

Recent advances in reprogrammable hardware (e.g., FPGAs) and memory technology (e.g., DDR4, HBM) promise to solve performance problems inherent to graph processing like irregular memory access patterns on traditional hardware (e.g., CPU).…

Hardware Architecture · Computer Science 2021-04-19 Jonas Dann , Daniel Ritter , Holger Fröning

METICULOUS: An FPGA-based Main Memory Emulator for System Software Studies

Due to the scaling problem of the DRAM technology, non-volatile memory devices, which are based on different principle of operation than DRAM, are now being intensively developed to expand the main memory of computers. Disaggregated memory…

Hardware Architecture · Computer Science 2023-09-14 Takahiro Hirofuchi , Takaaki Fukai , Akram Ben Ahmed , Ryousei Takano , Kento Sato

Online Application Guidance for Heterogeneous Memory Systems

Many high end and next generation computing systems to incorporated alternative memory technologies to meet performance goals. Since these technologies present distinct advantages and tradeoffs compared to conventional DDR* SDRAM, such as…

Performance · Computer Science 2021-10-06 M. Ben Olson , Brandon Kammerdiener , Kshitij A. Doshi , Terry Jones , Michael R. Jantz

A flexible FPGA accelerator for convolutional neural networks

Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Kingshuk Majumder , Shubham Nema , Uday Bondhugula

DX100: A Programmable Data Access Accelerator for Indirection

Indirect memory accesses frequently appear in applications where memory bandwidth is a critical bottleneck. Prior indirect memory access proposals, such as indirect prefetchers, runahead execution, fetchers, and decoupled access/execute…

Hardware Architecture · Computer Science 2025-06-03 Alireza Khadem , Kamalavasan Kamalakkannan , Zhenyan Zhu , Akash Poptani , Yufeng Gu , Jered Benjamin Dominguez-Trujillo , Nishil Talati , Daichi Fujiki , Scott Mahlke , Galen Shipman , Reetuparna Das

Analytical Model of Memory-Bound Applications Compiled with High Level Synthesis

The increasing demand of dedicated accelerators to improve energy efficiency and performance has highlighted FPGAs as a promising option to deliver both. However, programming FPGAs in hardware description languages requires long time and…

Hardware Architecture · Computer Science 2020-03-31 Maria A. Dávila-Guzmán , Rubén Gran Tejero , María Villarroya-Gaudó , Darío Suárez Gracia

On the Off-chip Memory Latency of Real-Time Systems: Is DDR DRAM Really the Best Option?

Predictable execution time upon accessing shared memories in multi-core real-time systems is a stringent requirement. A plethora of existing works focus on the analysis of Double Data Rate Dynamic Random Access Memories (DDR DRAMs), or…

Hardware Architecture · Computer Science 2018-10-17 Mohamed Hassan

Medusa: A Scalable Interconnect for Many-Port DNN Accelerators and Wide DRAM Controller Interfaces

To cope with the increasing demand and computational intensity of deep neural networks (DNNs), industry and academia have turned to accelerator technologies. In particular, FPGAs have been shown to provide a good balance between performance…

Hardware Architecture · Computer Science 2018-07-12 Yongming Shen , Tianchu Ji , Michael Ferdman , Peter Milder

Memory Controller Design Under Cloud Workloads

This work studies the behavior of state-of-the-art memory controller designs when executing scale-out workloads. It considers memory scheduling techniques, memory page management policies, the number of memory channels, and the address…

Hardware Architecture · Computer Science 2016-12-01 Mostafa Mahmoud , Andreas Moshovos

Addressing memory bandwidth scalability in vector processors for streaming applications

As the size of artificial intelligence and machine learning (AI/ML) models and datasets grows, the memory bandwidth becomes a critical bottleneck. The paper presents a novel extended memory hierarchy that addresses some major memory…

Hardware Architecture · Computer Science 2025-05-20 Jordi Altayo , Paul Delestrac , David Novo , Simey Yang , Debjyoti Bhattacharjee , Francky Catthoor

Modular Neural Computer

This paper introduces the Modular Neural Computer (MNC), a memory-augmented neural architecture for exact algorithmic computation on variable-length inputs. The model combines an external associative memory of scalar cells, explicit read…

Machine Learning · Computer Science 2026-03-17 Florin Leon

Embedded Online Optimization for Model Predictive Control at Megahertz Rates

Faster, cheaper, and more power efficient optimization solvers than those currently offered by general-purpose solutions are required for extending the use of model predictive control (MPC) to resource-constrained embedded platforms. We…

Systems and Control · Computer Science 2017-10-13 Juan L. Jerez , Paul J. Goulart , Stefan Richter , George A. Constantinides , Eric C. Kerrigan , Manfred Morari

f-CNN$^{\text{x}}$: A Toolflow for Mapping Multi-CNN Applications on FPGAs

The predictive power of Convolutional Neural Networks (CNNs) has been an integral factor for emerging latency-sensitive applications, such as autonomous drones and vehicles. Such systems employ multiple CNNs, each one trained for a…

Computer Vision and Pattern Recognition · Computer Science 2021-06-09 Stylianos I. Venieris , Christos-Savvas Bouganis

A Configurable and Efficient Memory Hierarchy for Neural Network Hardware Accelerator

As machine learning applications continue to evolve, the demand for efficient hardware accelerators, specifically tailored for deep neural networks (DNNs), becomes increasingly vital. In this paper, we propose a configurable memory…

Hardware Architecture · Computer Science 2024-04-25 Oliver Bause , Paul Palomero Bernardo , Oliver Bringmann