Related papers: ImaGen: A General Framework for Generating Memory-…

Compiling Halide Programs to Push-Memory Accelerators

Image processing and machine learning applications benefit tremendously from hardware acceleration, but existing compilers target either FPGAs, which sacrifice power and performance for flexible hardware, or ASICs, which rapidly become…

Hardware Architecture · Computer Science 2021-05-28 Qiaoyi Liu , Dillon Huff , Jeff Setter , Maxwell Strange , Kathleen Feng , Kavya Sreedhar , Ziheng Wang , Keyi Zhang , Mark Horowitz , Priyanka Raina , Fredrik Kjolstad

PIMSIM-NN: An ISA-based Simulation Framework for Processing-in-Memory Accelerators

Processing-in-memory (PIM) has shown extraordinary potential in accelerating neural networks. To evaluate the performance of PIM accelerators, we present an ISA-based simulation framework including a dedicated ISA targeting neural networks…

Hardware Architecture · Computer Science 2024-02-29 Xinyu Wang , Xiaotian Sun , Yinhe Han , Xiaoming Chen

Design space exploration for image processing architectures on FPGA targets

Due to the emergence of embedded applications in image and video processing, communication and cryptography, improvement of pictorial information for better human perception like deblurring, denoising in several fields such as satellite…

Hardware Architecture · Computer Science 2014-04-16 Chandrajit Pal , Avik Kotal , Asit Samanta , Amlan Chakrabarti , Ranjan Ghosh

Demystifying Memory Access Patterns of FPGA-Based Graph Processing Accelerators

Recent advances in reprogrammable hardware (e.g., FPGAs) and memory technology (e.g., DDR4, HBM) promise to solve performance problems inherent to graph processing like irregular memory access patterns on traditional hardware (e.g., CPU).…

Hardware Architecture · Computer Science 2021-04-19 Jonas Dann , Daniel Ritter , Holger Fröning

Theoretical Model of Computation and Algorithms for FPGA-based Hardware Accelerators

While FPGAs have been used extensively as hardware accelerators in industrial computation, no theoretical model of computation has been devised for the study of FPGA-based accelerators. In this paper, we present a theoretical model of…

Data Structures and Algorithms · Computer Science 2018-11-19 Martin Hora , Václav Končický , Jakub Tětek

Exploring Memory Access Patterns for Graph Processing Accelerators

Recent trends in business and technology (e.g., machine learning, social network analysis) benefit from storing and processing growing amounts of graph-structured data in databases and data science platforms. FPGAs as accelerators for graph…

Databases · Computer Science 2021-02-09 Jonas Dann , Daniel Ritter , Holger Fröning

RAPIDNN: In-Memory Deep Neural Network Acceleration Framework

Deep neural networks (DNN) have demonstrated effectiveness for various applications such as image processing, video segmentation, and speech recognition. Running state-of-the-art DNNs on current systems mostly relies on either…

Neural and Evolutionary Computing · Computer Science 2019-04-15 Mohsen Imani , Mohammad Samragh , Yeseong Kim , Saransh Gupta , Farinaz Koushanfar , Tajana Rosing

A Survey of FPGA-Based Neural Network Accelerator

Recent researches on neural network have shown significant advantage in machine learning over traditional algorithms based on handcrafted features and models. Neural network is now widely adopted in regions like image, speech and video…

Hardware Architecture · Computer Science 2018-12-07 Kaiyuan Guo , Shulin Zeng , Jincheng Yu , Yu Wang , Huazhong Yang

OpenCLIPER: an OpenCL-based C++ Framework for Overhead-Reduced Medical Image Processing and Reconstruction on Heterogeneous Devices

Medical image processing is often limited by the computational cost of the involved algorithms. Whereas dedicated computing devices (GPUs in particular) exist and do provide significant efficiency boosts, they have an extra cost of use in…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-01 Federico Simmross-Wattenberg , Manuel Rodríguez-Cayetano , Javier Royuela-del-Val , Elena Martín-González , Elisa Moya-Sáez , Marcos Martín-Fernández , Carlos Alberola-López

Image processing Application Development on Software Configurable Processor Array

The software configurable processor finds best use in the embedded systems. These processors have onchip logic like FPGA (Field Programmable Gate Array) and thus can be configured to implement custom hardware functionality. The digital…

Hardware Architecture · Computer Science 2025-05-13 Ganesh Prabhu , Steevan Rodrigues , Niranjan Chiplunkar , Niranjan U. C

Programming Heterogeneous Systems from an Image Processing DSL

Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating,…

Software Engineering · Computer Science 2016-11-01 Jing Pu , Steven Bell , Xuan Yang , Jeff Setter , Stephen Richardson , Jonathan Ragan-Kelley , Mark Horowitz

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus…

Machine Learning · Computer Science 2023-07-12 Zixuan Ma , Haojie Wang , Jingze Xing , Liyan Zheng , Chen Zhang , Huanqi Cao , Kezhao Huang , Shizhi Tang , Penghan Wang , Jidong Zhai

Efficient Use of Limited-Memory Accelerators for Linear Learning on Heterogeneous Systems

We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems. Our scheme allows to efficiently employ compute accelerators such as GPUs and FPGAs for the training of…

Machine Learning · Computer Science 2017-11-08 Celestine Dünner , Thomas Parnell , Martin Jaggi

Automatic Optimization of Hardware Accelerators for Image Processing

In the domain of image processing, often real-time constraints are required. In particular, in safety-critical applications, such as X-ray computed tomography in medical imaging or advanced driver assistance systems in the automotive…

Programming Languages · Computer Science 2015-02-27 Oliver Reiche , Konrad Häublein , Marc Reichenbach , Frank Hannig , Jürgen Teich , Dietmar Fey

MFAGAN: A Compression Framework for Memory-Efficient On-Device Super-Resolution GAN

Generative adversarial networks (GANs) have promoted remarkable advances in single-image super-resolution (SR) by recovering photo-realistic images. However, high memory consumption of GAN-based SR (usually generators) causes performance…

Hardware Architecture · Computer Science 2021-07-28 Wenlong Cheng , Mingbo Zhao , Zhiling Ye , Shuhang Gu

ImageCL: An Image Processing Language for Performance Portability on Heterogeneous Systems

Modern computer systems typically conbine multicore CPUs with accelerators like GPUs for inproved performance and energy efficiency. However, these sys- tems suffer from poor performance portability, code tuned for one device must be…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-05-23 Thomas L. Falch , Anne C. Elster

OpenHEC: A Framework for Application Programmers to Design FPGA-based Systems

Today, there is a trend to incorporate more intelligence (e.g., vision capabilities) into a wide range of devices, which makes high performance a necessity for computing systems. Furthermore, for embedded systems, low power consumption…

Other Computer Science · Computer Science 2014-08-25 Zhilei Chai , Zhibin Wang , Wenmin Yang , Shuai Ding , Yuanpu Zhang

Comprehensive Optimization of Parametric Kernels for Graphics Processing Units

This work deals with the optimization of computer programs targeting Graphics Processing Units (GPUs). The goal is to lift, from programmers to optimizing compilers, the heavy burden of determining program details that are dependent on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-16 Xiaohui Chen , Marc Moreno-Maza , Jeeva Paudel , Ning Xie

A Tool for Automatically Suggesting Source-Code Optimizations for Complex GPU Kernels

Future computing systems, from handhelds to supercomputers, will undoubtedly be more parallel and heterogeneous than todays systems to provide more performance and energy efficiency. Thus, GPUs are increasingly being used to accelerate…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-18 Saeed Taheri , Apan Qasem , Martin Burtscher

Compiling Neural Networks for a Computational Memory Accelerator

Computational memory (CM) is a promising approach for accelerating inference on neural networks (NN) by using enhanced memories that, in addition to storing data, allow computations on them. One of the main challenges of this approach is…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-27 Kornilios Kourtis , Martino Dazzi , Nikolas Ioannou , Tobias Grosser , Abu Sebastian , Evangelos Eleftheriou