Related papers: Demystifying Memory Access Patterns of FPGA-Based …
Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph…
Graph is a well known data structure to represent the associated relationships in a variety of applications, e.g., data science and machine learning. Despite a wealth of existing efforts on developing graph processing systems for improving…
In this paper, we explore the limits of graphics processors (GPUs) for general purpose parallel computing by studying problems that require highly irregular data access patterns: parallel graph algorithms for list ranking and connected…
FPGAs are increasingly utilized in data centers due to their capacity to exploit data parallelism in computationally intensive workloads. Furthermore, the processing of modern data center workloads requires moving vast amounts of data,…
Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine…
The growing complexity of computational workloads has amplified the need for efficient and specialized hardware accelerators. Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) have emerged as prominent solutions,…
Due to the irregular nature of connections in most graph datasets, partitioning graph analysis algorithms across multiple computational nodes that do not share a common memory inevitably leads to large amounts of interconnect traffic.…
Recent researches on neural network have shown significant advantage in machine learning over traditional algorithms based on handcrafted features and models. Neural network is now widely adopted in regions like image, speech and video…
While FPGAs have been used extensively as hardware accelerators in industrial computation, no theoretical model of computation has been devised for the study of FPGA-based accelerators. In this paper, we present a theoretical model of…
With the rapid development of in-depth learning, neural network and deep learning algorithms have been widely used in various fields, e.g., image, video and voice processing. However, the neural network model is getting larger and larger,…
Graph algorithms are increasingly used in applications that exploit large databases. However, conventional processor architectures are inadequate for handling the throughput and memory requirements of graph computation. Lincoln Laboratory's…
Processing large-scale graph datasets is computationally intensive and time-consuming. Processor-centric CPU and GPU architectures, commonly used for graph applications, often face bottlenecks caused by extensive data movement between the…
Graph accelerators have emerged as a promising solution for processing large-scale sparse graphs, leveraging the in-situ compu-tation of ReRAM-based crossbars to maximize computational efficiency. However, existing designs suffer from…
Many artificial intelligence (AI) devices have been developed to accelerate the training and inference of neural networks models. The most common ones are the Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU). They are highly…
Graph processing is typically considered to be a memory-bound rather than compute-bound problem. One common line of thought is that more available memory bandwidth corresponds to better graph processing performance. However, in this work we…
Due to the emergence of embedded applications in image and video processing, communication and cryptography, improvement of pictorial information for better human perception like deblurring, denoising in several fields such as satellite…
High Performance Computing (HPC) platforms allow scientists to model computationally intensive algorithms. HPC clusters increasingly use General-Purpose Graphics Processing Units (GPGPUs) as accelerators; FPGAs provide an attractive…
Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly…
Due to recent advances in digital technologies, and availability of credible data, an area of artificial intelligence, deep learning, has emerged, and has demonstrated its ability and effectiveness in solving complex learning problems not…
Even with generational improvements in DRAM technology, memory access latency still remains the major bottleneck for application accelerators, primarily due to limitations in memory interface IPs which cannot fully account for variations in…