English
Related papers

Related papers: TorchSparse++: Efficient Training and Inference Fr…

200 papers

Deep learning on point clouds has received increased attention thanks to its wide applications in AR/VR and autonomous driving. These applications require low latency and high accuracy to provide real-time user experience and ensure user…

Machine Learning · Computer Science 2022-04-22 Haotian Tang , Zhijian Liu , Xiuyu Li , Yujun Lin , Song Han

Knowledge graph (KG) learning offers a powerful framework for generating new knowledge and making inferences. Training KG embedding can take a significantly long time, especially for larger datasets. Our analysis shows that the gradient…

Machine Learning · Computer Science 2025-05-01 Md Saidul Hoque Anik , Ariful Azad

Sparse tensors are rapidly becoming critical components of modern deep learning workloads. However, developing high-performance sparse operators can be difficult and tedious, and existing vendor libraries cannot satisfy the escalating…

Machine Learning · Computer Science 2023-02-22 Zihao Ye , Ruihang Lai , Junru Shao , Tianqi Chen , Luis Ceze

Programming high-performance sparse GPU kernels is notoriously difficult, requiring both substantial effort and deep expertise. Sparse compilers aim to simplify this process, but existing systems fall short in two key ways. First, they are…

Programming Languages · Computer Science 2025-10-21 Jaeyeon Won , Willow Ahrens , Joel S. Emer , Saman Amarasinghe

General-purpose Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel in scientific computing and deep learning. The emergence of new matrix computation units such as Tensor Cores (TCs) brings more opportunities for SpMM…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-17 Haisha Zhao , San Li , Jiaheng Wang , Chunbao Zhou , Jue Wang , Zhikuang Xin , Shunde Li , Zhiqiang Liang , Zhijie Pan , Fang Liu , Yan Zeng , Yangang Wang , Xuebin Chi

Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) are fundamental in both conventional (graph analytics, scientific computing) and emerging (sparse DNN, GNN) domains. Workload-balancing and parallel-reduction are…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-15 Guyue Huang , Guohao Dai , Yu Wang , Yufei Ding , Yuan Xie

Sparse matrix-vector multiplication (SpMV) is an essential linear algebra operation that dominates the computing cost in many scientific applications. Due to providing massive parallelism and high memory bandwidth, GPUs are commonly used to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-14 Mina Ashoury , Mohammad Loni , Farshad Khunjush , Masoud Daneshtalab

Rotation equivariant graph neural networks, i.e. networks designed to guarantee certain geometric relations between their inputs and outputs, yield state of the art performance on spatial deep learning tasks. They exhibit high data…

Machine Learning · Computer Science 2025-05-12 Vivek Bharadwaj , Austin Glover , Aydin Buluc , James Demmel

Scientific workloads have traditionally exploited high levels of sparsity to accelerate computation and reduce memory requirements. While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because…

Machine Learning · Computer Science 2020-09-02 Trevor Gale , Matei Zaharia , Cliff Young , Erich Elsen

Sparse general matrix multiplication (SpGEMM) is an important and expensive computation primitive in many real-world applications. Due to SpGEMM's inherent irregularity and the vast diversity of its input matrices, developing…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-16 Zhaoyang Du , Yijin Guan , Tianchan Guan , Dimin Niu , Linyong Huang , Hongzhong Zheng , Yuan Xie

In recent years, Transformer-based language models have become the standard approach for natural language processing tasks. However, stringent throughput and latency requirements in industrial applications are limiting their adoption. To…

Machine Learning · Computer Science 2023-06-30 Haihao Shen , Hengyu Meng , Bo Dong , Zhe Wang , Ofir Zafrir , Yi Ding , Yu Luo , Hanwen Chang , Qun Gao , Ziheng Wang , Guy Boudoukh , Moshe Wasserblat

Core computations in Graph Neural Network (GNN) training and inference are often mapped to sparse matrix operations such as sparse-dense matrix multiplication (SpMM). These sparse operations are harder to optimize by manual tuning because…

Machine Learning · Computer Science 2024-03-25 Md Saidul Hoque Anik , Pranav Badhe , Rohit Gampa , Ariful Azad

Sparse data structures are commonly used in neural networks to reduce the memory footprint. These data structures are compact but cause irregularities such as random memory accesses, which prevent efficient use of the memory hierarchy. GPUs…

Programming Languages · Computer Science 2025-06-19 Hossein Albakri , Kazem Cheshmi

Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental kernel across scientific computing and machine learning. While prior work accelerates SpMM using Tensor Cores, no existing sparse kernel exploits the asynchronous features of…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-21 Jie Liu , Huanzhi Pu , Zhiru Zhang

Machine learning is increasingly used to improve decisions within branch-and-bound algorithms for mixed-integer programming. Many existing approaches rely on deep learning, which often requires very large training datasets and substantial…

Machine Learning · Computer Science 2026-04-02 Selin Bayramoğlu , George L Nemhauser , Nikolaos V Sahinidis

The rapid growth in the size of deep learning models strains the capabilities of traditional dense computation paradigms. Leveraging sparse computation has become increasingly popular for training and deploying large-scale models, but…

Machine Learning · Computer Science 2024-06-21 Bobby Yan , Alexander J. Root , Trevor Gale , David Broman , Fredrik Kjolstad

This paper presents GPU performance optimization and scaling results for inference models of the Sparse Deep Neural Network Challenge 2020. Demands for network quality have increased rapidly, pushing the size and thus the memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-04 Mert Hidayetoglu , Carl Pearson , Vikram Sharma Mailthody , Eiman Ebrahimi , Jinjun Xiong , Rakesh Nagi , Wen-Mei Hwu

As deep learning models scale, sparse computation and specialized dataflow hardware have emerged as powerful solutions to address efficiency. We propose FuseFlow, a compiler that converts sparse machine learning models written in PyTorch to…

Machine Learning · Computer Science 2026-01-27 Rubens Lacouture , Nathan Zhang , Ritvik Sharma , Marco Siracusa , Fredrik Kjolstad , Kunle Olukotun , Olivia Hsu

Training Convolutional Neural Networks (CNNs) usually requires a large number of computational resources. In this paper, \textit{SparseTrain} is proposed to accelerate CNN training by fully exploiting the sparsity. It mainly involves three…

Computer Vision and Pattern Recognition · Computer Science 2020-07-28 Pengcheng Dai , Jianlei Yang , Xucheng Ye , Xingzhou Cheng , Junyu Luo , Linghao Song , Yiran Chen , Weisheng Zhao

Despite numerous efforts for optimizing the performance of Sparse Matrix and Vector Multiplication (SpMV) on modern hardware architectures, few works are done to its sparse counterpart, Sparse Matrix and Sparse Vector Multiplication…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-18 Min Li , Yulong Ao , Chao Yang
‹ Prev 1 2 3 10 Next ›