English
Related papers

Related papers: A GPU Based Memory Optimized Parallel Method For F…

200 papers

Study of general purpose computation by GPU (Graphics Processing Unit) can improve the image processing capability of micro-computer system. This paper studies the parallelism of the different stages of decimation in time radix 2 FFT…

Mathematical Software · Computer Science 2015-06-01 Feifei Shen , Zhenjian Song , Congrui Wu , Jiaqi Geng , Qingyun Wang

This paper evaluates the efficacy of recent commercial processing-in-memory (PIM) solutions to accelerate fast Fourier transform (FFT), an important primitive across several domains. Specifically, we observe that efficient implementations…

Hardware Architecture · Computer Science 2023-08-09 Mohamed Assem Ibrahim , Shaizeen Aga

We present a new library for parallel distributed Fast Fourier Transforms (FFT). The importance of FFT in science and engineering and the advances in high performance computing necessitate further improvements. AccFFT extends existing FFT…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-05-27 Amir Gholami , Judith Hill , Dhairya Malhotra , George Biros

The Discrete Fourier Transform (DFT) is essential for various applications ranging from signal processing to convolution and polynomial multiplication. The groundbreaking Fast Fourier Transform (FFT) algorithm reduces DFT time complexity…

Hardware Architecture · Computer Science 2023-04-06 Orian Leitersdorf , Yahav Boneh , Gonen Gazit , Ronny Ronen , Shahar Kvatinsky

Edge devices are being deployed at increasing volumes to sense and act on information from the physical world. The discrete Fourier transform (DFT) is often necessary to make this sensed data suitable for further processing -- such as by…

Matrix factorization (MF) is employed by many popular algorithms, e.g., collaborative filtering. The emerging GPU technology, with massively multicore and high intra-chip memory bandwidth but limited memory capacity, presents an opportunity…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-25 Wei Tan , Liangliang Cao , Liana Fong

We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming…

Mathematical Software · Computer Science 2022-11-08 Karel Adámek , Sofia Dimoudi , Mike Giles , Wesley Armour

There has been considerable research into improving Fast Fourier Transform (FFT) performance through parallelization and optimization for specialized hardware. However, even with those advancements, processing of very large files, over 1TB…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-07-28 Rostislav Tsiomenko , Bradley S. Rees

Convolutional neural networks have become an essential element of spatial deep learning systems. In the prevailing architecture, the convolution operation is performed with Fast Fourier Transforms (FFT) electronically in GPUs. The…

Emerging Technologies · Computer Science 2017-09-01 Jonathan George , Hani Nejadriahi , Volker Sorger

This paper explores practical aspects of using a high-level functional language for GPU-based arithmetic on ``midsize'' integers. By this we mean integers of up to about a quarter million bits, which is sufficient for most practical…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-24 Cosmin E. Oancea , Stephen M. Watt

We provide a preliminary study on utilizing GPU (Graphics Processing Unit) to accelerate computation for three simulation optimization tasks with either first-order or second-order algorithms. Compared to the implementation using only CPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-19 Jinghai He , Haoyu Liu , Yuhang Wu , Zeyu Zheng , Tingyu Zhu

GPU-based fast Fourier transform (FFT) is extremely important for scientific computing and signal processing. However, we find the inefficiency of existing FFT libraries and the absence of fault tolerance against soft error. To address…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-10 Shixun Wu , Yujia Zhai , Jinyang Liu , Jiajun Huang , Zizhe Jian , Huangliang Dai , Sheng Di , Franck Cappello , Zizhong Chen

Fast Fourier Transform (FFT) is an essential tool in scientific and engineering computation. The increasing demand for mixed-precision FFT has made it possible to utilize half-precision floating-point (FP16) arithmetic for faster speed and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-26 Binrui Li , Shenggan Cheng , James Lin

General-purpose multiprocessors (as, in our case, Intel IvyBridge and Intel Haswell) increasingly add GPU computing power to the former multicore architectures. When used for embedded applications (for us, Synthetic aperture radar) with…

Mathematical Software · Computer Science 2015-06-01 Mohamed Amine Bergach , Emilien Kofman , Robert de Simone , Serge Tissot , Michel Syska

In this paper, we use multithreaded fast Fourier transforms provided in three highly optimized packages, FFTW-2.1.5, FFTW-3.3.7, and Intel MKL FFT, to present a novel model-based parallel computing technique as a very effective and portable…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-08-17 Semyon Khokhriakov , Ravi Reddy , Alexey Lastovetsky

In the field of digital signal processing, the fast Fourier transform (FFT) is a fundamental algorithm, with its processors being implemented using either the pipelined architecture, well-known for high-throughput applications but weak in…

Hardware Architecture · Computer Science 2025-01-03 Fangyu Zhao , Chunhua Xiao , Zhiguo Wang , Xiaohua Du , Bo Dong

Matrix Factorization (MF) on large scale data takes substantial time on a Central Processing Unit (CPU). While Graphical Processing Unit (GPU)s could expedite the computation of MF, the available memory on a GPU is finite. Leveraging GPUs…

Machine Learning · Computer Science 2023-04-28 Prasad Bhavana , Vineet Padmanabhan

The FFT of three-dimensional (3D) input data is an important computational kernel of numerical simulations and is widely used in High Performance Computing (HPC) codes running on a large number of processors. Performance of many scientific…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-28 Vivek Gavane , Supriya Prabhugawankar , Shivam Garg , Archana Achalere , Rajendra Joshi

General Purpose Graphic Processing Unit(GPGPU) is used widely for achieving high performance or high throughput in parallel programming. This capability of GPGPUs is very famous in the new era and mostly used for scientific computing which…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-10 Vajira Thambawita , Roshan G. Ragel , Dhammike Elkaduwe

Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-04 Lingda Li , Ari B. Hayes , Stephen A. Hackler , Eddy Z. Zhang , Mario Szegedy , Shuaiwen Leon Song
‹ Prev 1 2 3 10 Next ›