Related papers: Programmable FPGA-based Memory Controller
Supported by their high power efficiency and recent advancements in High Level Synthesis (HLS), FPGAs are quickly finding their way into HPC and cloud systems. Large amounts of work have been done so far on loop and area optimizations for…
Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph…
Multi-port memory controllers (MPMCs) have become increasingly important in many modern applications due to the tremendous growth in bandwidth requirement. Many approaches so far have focused on improving either the memory access latency or…
FPGAs are increasingly utilized in data centers due to their capacity to exploit data parallelism in computationally intensive workloads. Furthermore, the processing of modern data center workloads requires moving vast amounts of data,…
Memory bandwidth is known to be a performance bottleneck for FPGA accelerators, especially when they deal with large multi-dimensional data-sets. A large body of work focuses on reducing of off-chip transfers, but few authors try to improve…
Offloading compute-intensive kernels to hardware accelerators relies on the large degree of parallelism offered by these platforms. However, the effective bandwidth of the memory interface often causes a bottleneck, hindering the…
Recent advances in reprogrammable hardware (e.g., FPGAs) and memory technology (e.g., DDR4, HBM) promise to solve performance problems inherent to graph processing like irregular memory access patterns on traditional hardware (e.g., CPU).…
Due to the scaling problem of the DRAM technology, non-volatile memory devices, which are based on different principle of operation than DRAM, are now being intensively developed to expand the main memory of computers. Disaggregated memory…
Many high end and next generation computing systems to incorporated alternative memory technologies to meet performance goals. Since these technologies present distinct advantages and tradeoffs compared to conventional DDR* SDRAM, such as…
Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…
Indirect memory accesses frequently appear in applications where memory bandwidth is a critical bottleneck. Prior indirect memory access proposals, such as indirect prefetchers, runahead execution, fetchers, and decoupled access/execute…
The increasing demand of dedicated accelerators to improve energy efficiency and performance has highlighted FPGAs as a promising option to deliver both. However, programming FPGAs in hardware description languages requires long time and…
Predictable execution time upon accessing shared memories in multi-core real-time systems is a stringent requirement. A plethora of existing works focus on the analysis of Double Data Rate Dynamic Random Access Memories (DDR DRAMs), or…
To cope with the increasing demand and computational intensity of deep neural networks (DNNs), industry and academia have turned to accelerator technologies. In particular, FPGAs have been shown to provide a good balance between performance…
This work studies the behavior of state-of-the-art memory controller designs when executing scale-out workloads. It considers memory scheduling techniques, memory page management policies, the number of memory channels, and the address…
As the size of artificial intelligence and machine learning (AI/ML) models and datasets grows, the memory bandwidth becomes a critical bottleneck. The paper presents a novel extended memory hierarchy that addresses some major memory…
This paper introduces the Modular Neural Computer (MNC), a memory-augmented neural architecture for exact algorithmic computation on variable-length inputs. The model combines an external associative memory of scalar cells, explicit read…
Faster, cheaper, and more power efficient optimization solvers than those currently offered by general-purpose solutions are required for extending the use of model predictive control (MPC) to resource-constrained embedded platforms. We…
The predictive power of Convolutional Neural Networks (CNNs) has been an integral factor for emerging latency-sensitive applications, such as autonomous drones and vehicles. Such systems employ multiple CNNs, each one trained for a…
As machine learning applications continue to evolve, the demand for efficient hardware accelerators, specifically tailored for deep neural networks (DNNs), becomes increasingly vital. In this paper, we propose a configurable memory…