Efficient Dynamic Structured Sparse Training with Learned Shuffles

Abhishek Tyagi; Arjun Iyer; Liam Young; William H Renninger; Christopher Kanan; Yuhao Zhu

Efficient Dynamic Structured Sparse Training with Learned Shuffles

Machine Learning 2025-10-17 v1

Authors: Abhishek Tyagi , Arjun Iyer , Liam Young , William H Renninger , Christopher Kanan , Yuhao Zhu

Abstract

Structured sparsity accelerates training and inference on modern GPUs, yet it still trails unstructured dynamic sparse training (DST) in accuracy. The shortfall stems from a loss of expressivity: whereas a dense layer can realize every possible mask obtained by choosing any $w$ active weights out of $n$ , a fixed block or N:M layout explores only a subset of those possibilities. We propose to close this gap by learning, for each layer, a single permutation matrix jointly with the structured weight matrix. Applied to three canonical structures -- block, N:M, and diagonals -- we show that permutation-augmented DST (PA-DST) matches unstructured baselines (RigL, SET) at 90--95\% sparsity on ImageNet-1K (ViT-B/16) and WikiText-103 (GPT-2), yet trains up to $1.21\times$ and infers up to $2.9\times$ faster. The results position structure + learned permutation as a sweet spot between accuracy and efficiency.

Keywords

sparse learning sparse matrix multiplication neural network training

Cite

@article{arxiv.2510.14812,
  title  = {Efficient Dynamic Structured Sparse Training with Learned Shuffles},
  author = {Abhishek Tyagi and Arjun Iyer and Liam Young and William H Renninger and Christopher Kanan and Yuhao Zhu},
  journal= {arXiv preprint arXiv:2510.14812},
  year   = {2025}
}

Efficient Dynamic Structured Sparse Training with Learned Shuffles

Abstract

Keywords

Cite

Related papers