Related papers: Sparse-IFT: Sparse Iso-FLOP Transformations for Ma…

Sparse Networks from Scratch: Faster Training without Losing Performance

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels. We accomplish this by developing sparse…

Machine Learning · Computer Science 2019-08-27 Tim Dettmers , Luke Zettlemoyer

SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models

The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural Language Processing (NLP). Instead of directly training on a downstream task, language models are first pre-trained on large datasets with…

Machine Learning · Computer Science 2023-08-01 Vithursan Thangarasa , Abhay Gupta , William Marshall , Tianda Li , Kevin Leong , Dennis DeCoste , Sean Lie , Shreyas Saxena

FLOPs as a Direct Optimization Objective for Learning Sparse Neural Networks

There exists a plethora of techniques for inducing structured sparsity in parametric models during the optimization process, with the final goal of resource-efficient inference. However, few methods target a specific number of…

Machine Learning · Computer Science 2018-11-26 Raphael Tang , Ashutosh Adhikari , Jimmy Lin

Compact Multi-level Sparse Neural Networks with Input Independent Dynamic Rerouting

Deep neural networks (DNNs) have shown to provide superb performance in many real life applications, but their large computation cost and storage requirement have prevented them from being deployed to many edge and internet-of-things (IoT)…

Neural and Evolutionary Computing · Computer Science 2021-12-22 Minghai Qin , Tianyun Zhang , Fei Sun , Yen-Kuang Chen , Makan Fardad , Yanzhi Wang , Yuan Xie

Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training

In this paper, we introduce a new perspective on training deep neural networks capable of state-of-the-art performance without the need for the expensive over-parameterization by proposing the concept of In-Time Over-Parameterization (ITOP)…

Machine Learning · Computer Science 2021-06-16 Shiwei Liu , Lu Yin , Decebal Constantin Mocanu , Mykola Pechenizkiy

Mixed Sparsity Training: Achieving 4$\times$ FLOP Reduction for Transformer Pretraining

Large language models (LLMs) have made significant strides in complex tasks, yet their widespread adoption is impeded by substantial computational demands. With hundreds of billion parameters, transformer-based LLMs necessitate months of…

Machine Learning · Computer Science 2024-08-22 Pihe Hu , Shaolong Li , Longbo Huang

Sparse Weight Activation Training

Neural network training is computationally and memory intensive. Sparse training can reduce the burden on emerging hardware platforms designed to accelerate sparse computations, but it can affect network convergence. In this work, we…

Machine Learning · Computer Science 2020-11-03 Md Aamir Raihan , Tor M. Aamodt

Pushing the Limits of Sparsity: A Bag of Tricks for Extreme Pruning

Pruning of deep neural networks has been an effective technique for reducing model size while preserving most of the performance of dense networks, crucial for deploying models on memory and power-constrained devices. While recent sparse…

Computer Vision and Pattern Recognition · Computer Science 2025-12-02 Andy Li , Aiden Durrant , Milan Markovic , Tianjin Huang , Souvik Kundu , Tianlong Chen , Lu Yin , Georgios Leontidis

Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

Large language models (LLMs) have achieved remarkable success across various tasks but face deployment challenges due to their massive computational demands. While post-training pruning methods like SparseGPT and Wanda can effectively…

Artificial Intelligence · Computer Science 2026-04-21 Qiao Xiao , Alan Ansell , Boqian Wu , Lu Yin , Mykola Pechenizkiy , Shiwei Liu , Decebal Constantin Mocanu

Sparse Training of Neural Networks based on Multilevel Mirror Descent

We introduce a dynamic sparse training algorithm based on linearized Bregman iterations / mirror descent that exploits the naturally incurred sparsity by alternating between periods of static and dynamic sparsity pattern updates. The key…

Machine Learning · Computer Science 2026-05-19 Yannick Lunk , Sebastian J. Scott , Leon Bungert

Sparse Growing Transformer: Training-Time Sparse Depth Allocation via Progressive Attention Looping

Existing approaches to increasing the effective depth of Transformers predominantly rely on parameter reuse, extending computation through recursive execution. Under this paradigm, the network structure remains static along the training…

Computation and Language · Computer Science 2026-04-17 Yao Chen , Yilong Chen , Yinqi Yang , Junyuan Shang , Zhenyu Zhang , Zefeng Zhang , Shuaiyi Nie , Shuohuan Wang , Yu Sun , Hua Wu , HaiFeng Wang , Tingwen Liu

A Topological Improvement of the Overall Performance of Sparse Evolutionary Training: Motif-Based Structural Optimization of Sparse MLPs Project

Deep Neural Networks (DNNs) have been proven to be exceptionally effective and have been applied across diverse domains within deep learning. However, as DNN models increase in complexity, the demand for reduced computational costs and…

Neural and Evolutionary Computing · Computer Science 2025-06-12 Xiaotian Chen , Hongyun Liu , Seyed Sahand Mohammadi Ziabari

Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?

Turning the weights to zero when training a neural network helps in reducing the computational complexity at inference. To progressively increase the sparsity ratio in the network without causing sharp weight discontinuities during…

Computer Vision and Pattern Recognition · Computer Science 2023-01-25 Antoine Vanderschueren , Christophe De Vleeschouwer

Dynamic Sparse Training via Balancing the Exploration-Exploitation Trade-off

Over-parameterization of deep neural networks (DNNs) has shown high prediction accuracy for many applications. Although effective, the large number of parameters hinders its popularity on resource-limited devices and has an outsize…

Machine Learning · Computer Science 2023-04-25 Shaoyi Huang , Bowen Lei , Dongkuan Xu , Hongwu Peng , Yue Sun , Mimi Xie , Caiwen Ding

SparseOpt: Addressing Normalization-induced Gradient Skew in Sparse Training

Dynamic Sparse Training (DST) methods train neural networks by maintaining sparsity while dynamically adapting the network topology. Despite the promise of reduced computation, DST methods converge significantly slower than dense training,…

Machine Learning · Computer Science 2026-05-28 Mohammed Adnan , Rohan Jain , Tom Jacobs , Ekansh Sharma , Rahul G. Krishnan , Rebekka Burkholz , Yani Ioannou

MicroNet: Improving Image Recognition with Extremely Low FLOPs

This paper aims at addressing the problem of substantial performance degradation at extremely low computational cost (e.g. 5M FLOPs on ImageNet classification). We found that two factors, sparse connectivity and dynamic activation function,…

Computer Vision and Pattern Recognition · Computer Science 2021-08-21 Yunsheng Li , Yinpeng Chen , Xiyang Dai , Dongdong Chen , Mengchen Liu , Lu Yuan , Zicheng Liu , Lei Zhang , Nuno Vasconcelos

SparseTrain:Leveraging Dynamic Sparsity in Training DNNs on General-Purpose SIMD Processors

Our community has greatly improved the efficiency of deep learning applications, including by exploiting sparsity in inputs. Most of that work, though, is for inference, where weight sparsity is known statically, and/or for specialized…

Machine Learning · Computer Science 2020-12-04 Zhangxiaowen Gong , Houxiang Ji , Christopher Fletcher , Christopher Hughes , Josep Torrellas

SaiT: Sparse Vision Transformers through Adaptive Token Pruning

While vision transformers have achieved impressive results, effectively and efficiently accelerating these models can further boost performances. In this work, we propose a dense/sparse training framework to obtain a unified model, enabling…

Computer Vision and Pattern Recognition · Computer Science 2022-10-13 Ling Li , David Thorsley , Joseph Hassoun

Robustness in sparse artificial neural networks trained with adaptive topology

We investigate the robustness of sparse artificial neural networks trained with adaptive topology. We focus on a simple yet effective architecture consisting of three sparse layers with 99% sparsity followed by a dense layer, applied to…

Machine Learning · Computer Science 2026-02-26 Bendegúz Sulyok , Gergely Palla , Filippo Radicchi , Santo Fortunato

Towards Sparsification of Graph Neural Networks

As real-world graphs expand in size, larger GNN models with billions of parameters are deployed. High parameter count in such models makes training and inference on graphs expensive and challenging. To reduce the computational and memory…

Machine Learning · Computer Science 2023-02-27 Hongwu Peng , Deniz Gurevin , Shaoyi Huang , Tong Geng , Weiwen Jiang , Omer Khan , Caiwen Ding