Related papers: Efficient Dynamic Structured Sparse Training with …

Dynamic Sparse Training with Structured Sparsity

Dynamic Sparse Training (DST) methods achieve state-of-the-art results in sparse neural network training, matching the generalization of dense models while enabling sparse training and inference. Although the resulting models are highly…

Machine Learning · Computer Science 2024-02-23 Mike Lasby , Anna Golubeva , Utku Evci , Mihai Nica , Yani Ioannou

Dynamic Sparse Training of Diagonally Sparse Networks

Recent advances in Dynamic Sparse Training (DST) have pushed the frontier of sparse neural network training in structured and unstructured contexts, matching dense-model performance while drastically reducing parameter counts to facilitate…

Machine Learning · Computer Science 2025-06-16 Abhishek Tyagi , Arjun Iyer , William H Renninger , Christopher Kanan , Yuhao Zhu

Dynamic Sparsity Is Channel-Level Sparsity Learner

Sparse training has received an upsurging interest in machine learning due to its tantalizing saving potential for the entire training process as well as inference. Dynamic sparse training (DST), as a leading sparse training approach, can…

Machine Learning · Computer Science 2023-11-13 Lu Yin , Gen Li , Meng Fang , Li Shen , Tianjin Huang , Zhangyang Wang , Vlado Menkovski , Xiaolong Ma , Mykola Pechenizkiy , Shiwei Liu

Navigating Extremes: Dynamic Sparsity in Large Output Spaces

In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a more memory efficient training process, as it maintains sparsity…

Machine Learning · Computer Science 2025-02-11 Nasib Ullah , Erik Schultheis , Mike Lasby , Yani Ioannou , Rohit Babbar

Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training

Exploiting sparsity enables hardware systems to run neural networks faster and more energy-efficiently. However, most prior sparsity-centric optimization techniques only accelerate the forward pass of neural networks and usually require an…

Machine Learning · Computer Science 2018-06-05 Maohua Zhu , Jason Clemons , Jeff Pool , Minsoo Rhu , Stephen W. Keckler , Yuan Xie

Learnable Permutation for Structured Sparsity on Transformer Models

Structured sparsity has emerged as a popular model pruning technique, widely adopted in various architectures, including CNNs, Transformer models, and especially large language models (LLMs) in recent years. A promising direction to further…

Machine Learning · Computer Science 2026-02-02 Zekai Li , Ji Liu , Guanchen Li , Yixing Xu , Ziqiong Liu , Xuanwu Yin , Dong Li , Emad Barsoum

Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity

The demand for efficient processing of deep neural networks (DNNs) on embedded devices is a significant challenge limiting their deployment. Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference…

Computer Vision and Pattern Recognition · Computer Science 2023-09-28 Matteo Grimaldi , Darshan C. Ganji , Ivan Lazarevich , Sudhakar Sah

Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks

Unstructured pruning reduces the memory footprint in deep neural networks (DNNs). Recently, researchers proposed different types of structural pruning intending to reduce also the computation complexity. In this work, we first suggest a new…

Artificial Intelligence · Computer Science 2021-10-22 Itay Hubara , Brian Chmiel , Moshe Island , Ron Banner , Seffi Naor , Daniel Soudry

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments. It can be generally categorized into unstructured fine-grained sparsity that zeroes out multiple…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Aojun Zhou , Yukun Ma , Junnan Zhu , Jianbo Liu , Zhijie Zhang , Kun Yuan , Wenxiu Sun , Hongsheng Li

An Algorithm-Hardware Co-Optimized Framework for Accelerating N:M Sparse Transformers

The Transformer has been an indispensable staple in deep learning. However, for real-life applications, it is very challenging to deploy efficient Transformers due to immense parameters and operations of models. To relieve this burden,…

Hardware Architecture · Computer Science 2022-11-01 Chao Fang , Aojun Zhou , Zhongfeng Wang

Sparse Networks from Scratch: Faster Training without Losing Performance

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels. We accomplish this by developing sparse…

Machine Learning · Computer Science 2019-08-27 Tim Dettmers , Luke Zettlemoyer

SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference

In recent years, there has been a flurry of research in deep neural network pruning and compression. Early approaches prune weights individually. However, it is difficult to take advantage of the resulting unstructured sparsity patterns on…

Machine Learning · Computer Science 2020-08-28 Ziheng Wang

Sparse Mixture Once-for-all Adversarial Training for Efficient In-Situ Trade-Off Between Accuracy and Robustness of DNNs

Existing deep neural networks (DNNs) that achieve state-of-the-art (SOTA) performance on both clean and adversarially-perturbed images rely on either activation or weight conditioned convolution operations. However, such conditional…

Computer Vision and Pattern Recognition · Computer Science 2023-02-08 Souvik Kundu , Sairam Sundaresan , Sharath Nittur Sridhar , Shunlin Lu , Han Tang , Peter A. Beerel

On the Interplay Between Sparsity and Training in Deep Reinforcement Learning

We study the benefits of different sparse architectures for deep reinforcement learning. In particular, we focus on image-based domains where spatially-biased and fully-connected architectures are common. Using these and several other…

Machine Learning · Computer Science 2025-02-04 Fatima Davelouis , John D. Martin , Michael Bowling

Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency

Recent research has focused on weight sparsity in deep neural network training to reduce FLOPs, aiming for improved efficiency (test accuracy w.r.t training FLOPs). However, sparse weight training often compromises accuracy, requiring…

Machine Learning · Computer Science 2024-07-19 Vithursan Thangarasa , Shreyas Saxena , Abhay Gupta , Sean Lie

PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation

Dynamic sparsity, where the sparsity patterns are unknown until runtime, poses a significant challenge to deep learning. The state-of-the-art sparsity-aware deep learning solutions are restricted to pre-defined, static sparsity patterns due…

Machine Learning · Computer Science 2023-10-10 Ningxin Zheng , Huiqiang Jiang , Quanlu Zhang , Zhenhua Han , Yuqing Yang , Lingxiao Ma , Fan Yang , Chengruidong Zhang , Lili Qiu , Mao Yang , Lidong Zhou

CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models

Sparsity-aware training is an effective approach for transforming large language models (LLMs) into hardware-friendly sparse patterns, thereby reducing latency and memory consumption during inference. In this paper, we propose Continuous…

Machine Learning · Computer Science 2025-10-01 Weiyu Huang , Yuezhou Hu , Jun Zhu , Jianfei Chen

Mixed Sparsity Training: Achieving 4$\times$ FLOP Reduction for Transformer Pretraining

Large language models (LLMs) have made significant strides in complex tasks, yet their widespread adoption is impeded by substantial computational demands. With hundreds of billion parameters, transformer-based LLMs necessitate months of…

Machine Learning · Computer Science 2024-08-22 Pihe Hu , Shaolong Li , Longbo Huang

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Overparameterized neural networks generalize well but are expensive to train. Ideally, one would like to reduce their computational cost while retaining their generalization benefits. Sparse model training is a simple and promising approach…

Machine Learning · Computer Science 2022-05-12 Tri Dao , Beidi Chen , Kaizhao Liang , Jiaming Yang , Zhao Song , Atri Rudra , Christopher Ré

S-STE: Continuous Pruning Function for Efficient 2:4 Sparse Pre-training

Training deep neural networks (DNNs) is costly. Fortunately, Nvidia Ampere and Hopper GPUs can accelerate matrix multiplications twice as fast as a dense equivalent by implementing 2:4 sparsity. However, previous STE-based 2:4 pre-training…

Machine Learning · Computer Science 2024-12-30 Yuezhou Hu , Jun Zhu , Jianfei Chen