Distributed, Parallel, and Cluster Computing · Computer Science
Fast convolution kernels on pascal GPU with high memory efficiency
Qiong Chang, Masaki Onishi, Tsutomu Maruyama
2022-12-02
Distributed, Parallel, and Cluster Computing · Computer Science
cuConv: A CUDA Implementation of Convolution for CNN Inference
Marc Jordà, Pedro Valero-Lara, Antonio J. Peña
2024-10-28
Machine Learning · Computer Science
A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA
Xinyu Zhang, Srinjoy Das, Ojash Neopane, Ken Kreutz-Delgado
2017-05-09
Computer Vision and Pattern Recognition · Computer Science
Energy-based Tuning of Convolutional Neural Networks on Multi-GPUs
Francisco M. Castro, Nicolás Guil, Manuel J. Marín-Jiménez, Jesús Pérez-Serrano +1
2018-08-02
Distributed, Parallel, and Cluster Computing · Computer Science
Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs
Chao Li, Yi Yang, Min Feng, Srimat Chakradhar +1
2016-10-13
Distributed, Parallel, and Cluster Computing · Computer Science
Optimizing Memory Efficiency for Convolution Kernels on Kepler GPUs
Xiaoming Chen, Jianxu Chen, Danny Z. Chen, Xiaobo Sharon Hu
2017-05-31
Machine Learning · Computer Science
{\mu}-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching
Yosuke Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka
2018-04-16
Computer Vision and Pattern Recognition · Computer Science
Computation-Performance Optimization of Convolutional Neural Networks with Redundant Kernel Removal
Chih-Ting Liu, Yi-Heng Wu, Yu-Sheng Lin, Shao-Yi Chien
2018-04-11
Computer Vision and Pattern Recognition · Computer Science
Deep Tensor Convolution on Multicores
David Budden, Alexander Matveev, Shibani Santurkar, Shraman Ray Chaudhuri +1
2017-06-13
Machine Learning · Computer Science
MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training
Hongwu Peng, Xi Xie, Kaustubh Shivdikar, MD Amit Hasan +5
2024-03-20
Distributed, Parallel, and Cluster Computing · Computer Science
ZNNi - Maximizing the Inference Throughput of 3D Convolutional Networks on Multi-Core CPUs and GPUs
Aleksandar Zlateski, Kisuk Lee, H. Sebastian Seung
2016-06-21
Distributed, Parallel, and Cluster Computing · Computer Science
Anatomy Of High-Performance Deep Learning Convolutions On SIMD Architectures
Evangelos Georganas, Sasikanth Avancha, Kunal Banerjee, Dhiraj Kalamkar +3
2018-08-22
Distributed, Parallel, and Cluster Computing · Computer Science
Accelerating convolutional neural network by exploiting sparsity on GPUs
Weizhi Xu, Yintai Sun, fhengyu Fan, Hui Yu +1
2023-08-01
Signal Processing · Electrical Eng. & Systems
Optimizing Temporal Convolutional Network inference on FPGA-based accelerators
Marco Carreras, Gianfranco Deriu, Luigi Raffo, Luca Benini +1
2020-05-11
Machine Learning · Computer Science
Towards Ultra-High Performance and Energy Efficiency of Deep Learning Systems: An Algorithm-Hardware Co-Optimization Framework
Yanzhi Wang, Caiwen Ding, Zhe Li, Geng Yuan +7
2018-02-20
Robotics · Computer Science
MobileXNet: An Efficient Convolutional Neural Network for Monocular Depth Estimation
Xingshuai Dong, Matthew A. Garratt, Sreenatha G. Anavatti, Hussein A. Abbass
2021-11-25
Distributed, Parallel, and Cluster Computing · Computer Science
Towards a Uniform Architecture for the Efficient Implementation of 2D and 3D Deconvolutional Neural Networks on FPGAs
Deguang Wang, Junzhong Shen, Mei Wen, Chunyuan Zhang
2019-03-08
Machine Learning · Computer Science
A deep Convolutional Neural Network for topology optimization with strong generalization ability
Yiquan Zhang, Bo Peng, Xiaoyi Zhou, Cheng Xiang +1
2020-04-01
Machine Learning · Computer Science
Energy-efficient Deployment of Deep Learning Applications on Cortex-M based Microcontrollers using Deep Compression
Mark Deutel, Philipp Woller, Christopher Mutschler, Jürgen Teich
2023-07-14
Hardware Architecture · Computer Science
HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation
Hanchen Ye, Xiaofan Zhang, Zhize Huang, Gengsheng Chen +1
2020-04-09
Machine Learning · Computer Science
Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Narendra Chaudhary, Sanchit Misra, Dhiraj Kalamkar, Alexander Heinecke +4
2021-04-19