Related papers: Fast Convolutional Nets With fbfft: A GPU Performa…

Fast Training of Convolutional Networks through FFTs

Convolutional networks are one of the most widely employed architectures in computer vision and machine learning. In order to leverage their ability to learn complex functions, large amounts of data are required for training. Training a…

Computer Vision and Pattern Recognition · Computer Science 2015-06-09 Michael Mathieu , Mikael Henaff , Yann LeCun

tcFFT: Accelerating Half-Precision FFT through Tensor Cores

Fast Fourier Transform (FFT) is an essential tool in scientific and engineering computation. The increasing demand for mixed-precision FFT has made it possible to utilize half-precision floating-point (FP16) arithmetic for faster speed and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-26 Binrui Li , Shenggan Cheng , James Lin

Towards On-Chip Optical FFTs for Convolutional Neural Networks

Convolutional neural networks have become an essential element of spatial deep learning systems. In the prevailing architecture, the convolution operation is performed with Fast Fourier Transforms (FFT) electronically in GPUs. The…

Emerging Technologies · Computer Science 2017-09-01 Jonathan George , Hani Nejadriahi , Volker Sorger

Fast Fourier Transformation for Optimizing Convolutional Neural Networks in Object Recognition

This paper proposes to use Fast Fourier Transformation-based U-Net (a refined fully convolutional networks) and perform image convolution in neural networks. Leveraging the Fast Fourier Transformation, it reduces the image convolution costs…

Computer Vision and Pattern Recognition · Computer Science 2020-10-12 Varsha Nair , Moitrayee Chatterjee , Neda Tavakoli , Akbar Siami Namin , Craig Snoeyink

TurboFFT: Co-Designed High-Performance and Fault-Tolerant Fast Fourier Transform on GPUs

GPU-based fast Fourier transform (FFT) is extremely important for scientific computing and signal processing. However, we find the inefficiency of existing FFT libraries and the absence of fault tolerance against soft error. To address…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-10 Shixun Wu , Yujia Zhai , Jinyang Liu , Jiajun Huang , Zizhe Jian , Huangliang Dai , Sheng Di , Franck Cappello , Zizhong Chen

cuConv: A CUDA Implementation of Convolution for CNN Inference

Convolutions are the core operation of deep learning applications based on Convolutional Neural Networks (CNNs). Current GPU architectures are highly efficient for training and deploying deep CNNs, and hence, these are largely used in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-28 Marc Jordà , Pedro Valero-Lara , Antonio J. Peña

Integrated Photonic FFT for Optical Convolutions towards Efficient and High-Speed Neural Networks

The technologically-relevant task of feature extraction from data performed in deep-learning systems is routinely accomplished as repeated fast Fourier transforms (FFT) electronically in prevalent domain-specific architectures such as in…

Optics · Physics 2020-03-25 Moustafa Ahmed , Yas Al-Hadeethi , Ahmed Bakry , Hamed Dalir , Volker J. Sorger

Acceleration of Convolutional Neural Network Using FFT-Based Split Convolutions

Convolutional neural networks (CNNs) have a large number of variables and hence suffer from a complexity problem for their implementation. Different methods and techniques have developed to alleviate the problem of CNN's complexity, such as…

Computer Vision and Pattern Recognition · Computer Science 2020-04-07 Kamran Chitsaz , Mohsen Hajabdollahi , Nader Karimi , Shadrokh Samavi , Shahram Shirani

Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAs

Deep Convolutional Neural Networks have become a Swiss knife in solving critical artificial intelligence tasks. However, deploying deep CNN models for latency-critical tasks remains to be challenging because of the complex nature of CNNs.…

Computer Vision and Pattern Recognition · Computer Science 2018-03-28 Chuanhao Zhuge , Xinheng Liu , Xiaofan Zhang , Sudeep Gummadi , Jinjun Xiong , Deming Chen

A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks

FPGA-based hardware accelerators for convolutional neural networks (CNNs) have obtained great attentions due to their higher energy efficiency than GPUs. However, it is challenging for FPGA-based solutions to achieve a higher throughput…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-06-09 Yixing Li , Zichuan Liu , Kai Xu , Hao Yu , Fengbo Ren

FFT Convolutions are Faster than Winograd on Modern CPUs, Here is Why

Winograd-based convolution has quickly gained traction as a preferred approach to implement convolutional neural networks (ConvNet) on various hardware platforms because it requires fewer floating point operations than FFT-based or direct…

Performance · Computer Science 2018-09-24 Aleksandar Zlateski , Zhen Jia , Kai Li , Fredo Durand

Fast convolution kernels on pascal GPU with high memory efficiency

The convolution computation is widely used in many fields, especially in CNNs. Because of the rapid growth of the training data in CNNs, GPUs have been used for the acceleration, and memory-efficient algorithms are focused because of thier…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-02 Qiong Chang , Masaki Onishi , Tsutomu Maruyama

FPGA-based Acceleration for Convolutional Neural Networks: A Comprehensive Review

Convolutional Neural Networks (CNNs) are fundamental to deep learning, driving applications across various domains. However, their growing complexity has significantly increased computational demands, necessitating efficient hardware…

Machine Learning · Computer Science 2025-05-21 Junye Jiang , Yaan Zhou , Yuanhao Gong , Haoxuan Yuan , Shuanglong Liu

QFCNN: Quantum Fourier Convolutional Neural Network

The neural network and quantum computing are both significant and appealing fields, with their interactive disciplines promising for large-scale computing tasks that are untackled by conventional computers. However, both developments are…

Quantum Physics · Physics 2021-06-22 Feihong Shen , Jun Liu

Phasor-Driven Acceleration for FFT-based CNNs

Recent research in deep learning (DL) has investigated the use of the Fast Fourier Transform (FFT) to accelerate the computations involved in Convolutional Neural Networks (CNNs) by replacing spatial convolution with element-wise…

Computer Vision and Pattern Recognition · Computer Science 2024-06-05 Eduardo Reis , Thangarajah Akilan , Mohammed Khalid

Accelerating convolutional neural network by exploiting sparsity on GPUs

Convolutional neural network (CNN) is an important deep learning method. The convolution operation takes a large proportion of the total execution time for CNN. Feature maps for convolution operation are usually sparse. Multiplications and…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-01 Weizhi Xu , Yintai Sun , fhengyu Fan , Hui Yu , Xin Fu

fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs

In recent years, Convolutional Neural Networks (ConvNets) have become an enabling technology for a wide range of novel embedded Artificial Intelligence systems. Across the range of applications, the performance needs vary significantly,…

Computer Vision and Pattern Recognition · Computer Science 2017-11-27 Stylianos I. Venieris , Christos-Savvas Bouganis

AccFFT: A library for distributed-memory FFT on CPU and GPU architectures

We present a new library for parallel distributed Fast Fourier Transforms (FFT). The importance of FFT in science and engineering and the advances in high performance computing necessitate further improvements. AccFFT extends existing FFT…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-05-27 Amir Gholami , Judith Hill , Dhairya Malhotra , George Biros

Depthwise Spatio-Temporal STFT Convolutional Neural Networks for Human Action Recognition

Conventional 3D convolutional neural networks (CNNs) are computationally expensive, memory intensive, prone to overfitting, and most importantly, there is a need to improve their feature learning capabilities. To address these issues, we…

Computer Vision and Pattern Recognition · Computer Science 2021-05-05 Sudhakar Kumawat , Manisha Verma , Yuta Nakashima , Shanmuganathan Raman

FFT-Based Deep Learning Deployment in Embedded Systems

Deep learning has delivered its powerfulness in many application domains, especially in image and speech recognition. As the backbone of deep learning, deep neural networks (DNNs) consist of multiple layers of various types with hundreds to…

Machine Learning · Computer Science 2017-12-14 Sheng Lin , Ning Liu , Mahdi Nazemi , Hongjia Li , Caiwen Ding , Yanzhi Wang , Massoud Pedram