Related papers: TorchSparse: Efficient Point Cloud Inference Engin…

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

Sparse convolution plays a pivotal role in emerging workloads, including point cloud processing in AR/VR, autonomous driving, and graph understanding in recommendation systems. Since the computation pattern is sparse and irregular,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-23 Haotian Tang , Shang Yang , Zhijian Liu , Ke Hong , Zhongming Yu , Xiuyu Li , Guohao Dai , Yu Wang , Song Han

Sparse Convolutions on Continuous Domains for Point Cloud and Event Stream Networks

Image convolutions have been a cornerstone of a great number of deep learning advances in computer vision. The research community is yet to settle on an equivalent operator for sparse, unstructured continuous data like point clouds and…

Computer Vision and Pattern Recognition · Computer Science 2020-12-03 Dominic Jack , Frederic Maire , Simon Denman , Anders Eriksson

FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores

Sparse Matrix-matrix Multiplication (SpMM) and Sampled Dense-dense Matrix Multiplication (SDDMM) are important sparse operators in scientific computing and deep learning. Tensor Core Units (TCUs) enhance modern accelerators with superior…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-17 Jinliang Shi , Shigang Li , Youxuan Xu , Rongtian Fu , Xueying Wang , Tong Wu

An Efficient FPGA Accelerator for Point Cloud

Deep learning-based point cloud processing plays an important role in various vision tasks, such as autonomous driving, virtual reality (VR), and augmented reality (AR). The submanifold sparse convolutional network (SSCN) has been widely…

Signal Processing · Electrical Eng. & Systems 2022-10-17 Zilun Wang , Wendong Mao , Peixiang Yang , Zhongfeng Wang , Jun Lin

SBNet: Sparse Blocks Network for Fast Inference

Conventional deep convolutional neural networks (CNNs) apply convolution operators uniformly in space across all feature maps for hundreds of layers - this incurs a high computational cost for real-time applications. For many problems such…

Computer Vision and Pattern Recognition · Computer Science 2018-06-08 Mengye Ren , Andrei Pokrovsky , Bin Yang , Raquel Urtasun

Optimizing Sparse Convolution on GPUs with CUDA for 3D Point Cloud Processing in Embedded Systems

In recent years, there has been a significant increase in the utilization of deep learning methods, particularly convolutional neural networks (CNNs), which have emerged as the dominant approach in various domains that involve structured…

Machine Learning · Computer Science 2024-04-09 Chester Luo , Kevin Lai

PointAcc: Efficient Point Cloud Accelerator

Deep learning on point clouds plays a vital role in a wide range of applications such as autonomous driving and AR/VR. These applications interact with people in real-time on edge devices and thus require low latency and low energy.…

Hardware Architecture · Computer Science 2021-10-15 Yujun Lin , Zhekai Zhang , Haotian Tang , Hanrui Wang , Song Han

SparsePipe: Parallel Deep Learning for 3D Point Clouds

We propose SparsePipe, an efficient and asynchronous parallelism approach for handling 3D point clouds with multi-GPU training. SparsePipe is built to support 3D sparse data such as point clouds. It achieves this by adopting generalized…

Computer Vision and Pattern Recognition · Computer Science 2020-12-29 Keke Zhai , Pan He , Tania Banerjee , Anand Rangarajan , Sanjay Ranka

TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs

Recently, graph neural networks (GNNs), as the backbone of graph-based machine learning, demonstrate great success in various domains (e.g., e-commerce). However, the performance of GNNs is usually unsatisfactory due to the highly sparse…

Machine Learning · Computer Science 2023-06-02 Yuke Wang , Boyuan Feng , Zheng Wang , Guyue Huang , Yufei Ding

Sparse GPU Kernels for Deep Learning

Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because…

Machine Learning · Computer Science 2020-09-02 Trevor Gale , Matei Zaharia , Cliff Young , Erich Elsen

Optimized CNNs for Rapid 3D Point Cloud Object Recognition

This study introduces a method for efficiently detecting objects within 3D point clouds using convolutional neural networks (CNNs). Our approach adopts a unique feature-centric voting mechanism to construct convolutional layers that…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 Tianyi Lyu , Dian Gu , Peiyuan Chen , Yaoting Jiang , Zhenhong Zhang , Huadong Pang , Li Zhou , Yiping Dong

SPOTS: An Accelerator for Sparse Convolutional Networks Leveraging Systolic General Matrix-Matrix Multiplication

This paper proposes a new hardware accelerator for sparse convolutional neural networks (CNNs) by building a hardware unit to perform the Image to Column (IM2COL) transformation of the input feature map coupled with a systolic array-based…

Hardware Architecture · Computer Science 2021-11-29 Mohammadreza Soltaniyeh , Richard P. Martin , Santosh Nagarakatte

Faster CNNs with Direct Sparse Convolutions and Guided Pruning

Phenomenally successful in practical inference problems, convolutional neural networks (CNN) are widely deployed in mobile devices, data centers, and even supercomputers. The number of parameters needed in CNNs, however, are often large and…

Computer Vision and Pattern Recognition · Computer Science 2017-08-01 Jongsoo Park , Sheng Li , Wei Wen , Ping Tak Peter Tang , Hai Li , Yiran Chen , Pradeep Dubey

OmniSparse: Training-Aware Fine-Grained Sparse Attention for Long-Video MLLMs

Existing sparse attention methods primarily target inference-time acceleration by selecting critical tokens under predefined sparsity patterns. However, they often fail to bridge the training-inference gap and lack the capacity for…

Computer Vision and Pattern Recognition · Computer Science 2025-11-20 Feng Chen , Yefei He , Shaoxuan He , Yuanyu He , Jing Liu , Lequan Lin , Akide Liu , Zhaoyang Li , Jiyuan Zhang , Zhenbang Sun , Bohan Zhuang , Qi Wu

DiffSparse: Accelerating Diffusion Transformers with Learned Token Sparsity

Diffusion models demonstrate outstanding performance in image generation, but their multi-step inference mechanism requires immense computational cost. Previous works accelerate inference by leveraging layer or token cache techniques to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Haowei Zhu , Ji Liu , Ziqiong Liu , Dong Li , Junhai Yong , Bin Wang , Emad Barsoum

Crescent: Taming Memory Irregularities for Accelerating Deep Point Cloud Analytics

3D perception in point clouds is transforming the perception ability of future intelligent machines. Point cloud algorithms, however, are plagued by irregular memory accesses, leading to massive inefficiencies in the memory sub-system,…

Hardware Architecture · Computer Science 2022-04-25 Yu Feng , Gunnar Hammonds , Yiming Gan , Yuhao Zhu

SPADE: Sparse Pillar-based 3D Object Detection Accelerator for Autonomous Driving

3D object detection using point cloud (PC) data is essential for perception pipelines of autonomous driving, where efficient encoding is key to meeting stringent resource and latency requirements. PointPillars, a widely adopted bird's-eye…

Hardware Architecture · Computer Science 2024-01-17 Minjae Lee , Seongmin Park , Hyungmin Kim , Minyong Yoon , Janghwan Lee , Jun Won Choi , Nam Sung Kim , Mingu Kang , Jungwook Choi

Dynasparse: Accelerating GNN Inference through Dynamic Sparsity Exploitation

Graph Neural Network (GNN) inference is used in many real-world applications. Data sparsity in GNN inference, including sparsity in the input graph and the GNN model, offer opportunities to further speed up inference. Also, many pruning…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-03-24 Bingyi Zhang , Viktor Prasanna

Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training

The success of DNN pruning has led to the development of energy-efficient inference accelerators that support pruned models with sparse weight and activation tensors. Because the memory layouts and dataflows in these architectures are…

Neural and Evolutionary Computing · Computer Science 2020-09-24 Dingqing Yang , Amin Ghasemazar , Xiaowei Ren , Maximilian Golub , Guy Lemieux , Mieszko Lis

VSCNN: Convolution Neural Network Accelerator With Vector Sparsity

Hardware accelerator for convolution neural network (CNNs) enables real time applications of artificial intelligence technology. However, most of the accelerators only support dense CNN computations or suffers complex control to support…

Hardware Architecture · Computer Science 2022-05-06 Kuo-Wei Chang , Tian-Sheuan Chang