Related papers: Block-wise Dynamic Sparseness

An Efficient Training Algorithm for Models with Block-wise Sparsity

Large-scale machine learning (ML) models are increasingly being used in critical domains like education, lending, recruitment, healthcare, criminal justice, etc. However, the training, deployment, and utilization of these models demand…

Machine Learning · Computer Science 2025-03-31 Ding Zhu , Zhiqun Zuo , Mohammad Mahdi Khalili

Sparse Networks from Scratch: Faster Training without Losing Performance

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels. We accomplish this by developing sparse…

Machine Learning · Computer Science 2019-08-27 Tim Dettmers , Luke Zettlemoyer

Block-Sparse Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modelling. Sparsity is a technique to reduce compute and memory requirements of deep learning…

Machine Learning · Computer Science 2017-11-09 Sharan Narang , Eric Undersander , Gregory Diamos

Sparse-Aware Neural Networks for Nonlinear Functionals: Mitigating the Exponential Dependence on Dimension

Deep neural networks have emerged as powerful tools for learning operators defined over infinite-dimensional function spaces. However, existing theories frequently encounter difficulties related to dimensionality and limited…

Machine Learning · Computer Science 2026-05-12 Jianfei Li , Shuo Huang , Han Feng , Ding-Xuan Zhou , Gitta Kutyniok

Faster Learned Sparse Retrieval with Block-Max Pruning

Learned sparse retrieval systems aim to combine the effectiveness of contextualized language models with the scalability of conventional data structures such as inverted indexes. Nevertheless, the indexes generated by these systems exhibit…

Information Retrieval · Computer Science 2024-05-03 Antonio Mallia , Torten Suel , Nicola Tonellotto

Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators

Nowadays, increasingly larger Deep Neural Networks (DNNs) are being developed, trained, and utilized. These networks require significant computational resources, putting a strain on both advanced and limited devices. Our solution is to…

Machine Learning · Computer Science 2024-07-16 Paolo D'Alberto , Taehee Jeong , Akshai Jain , Shreyas Manjunath , Mrinal Sarmah , Samuel Hsu , Yaswanth Raparti , Nitesh Pipralia

Truly Sparse Neural Networks at Scale

Recently, sparse training methods have started to be established as a de facto approach for training and inference efficiency in artificial neural networks. Yet, this efficiency is just in theory. In practice, everyone uses a binary mask to…

Machine Learning · Computer Science 2022-07-13 Selima Curci , Decebal Constantin Mocanu , Mykola Pechenizkiyi

Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models

Large language models (LLMs) often struggle with strict memory, latency, and power demands. To meet these demands, various forms of dynamic sparsity have been proposed that reduce compute on an input-by-input basis. These methods improve…

Computation and Language · Computer Science 2024-04-09 Jordan Dotzel , Yash Akhauri , Ahmed S. AbouElhamayed , Carly Jiang , Mohamed Abdelfattah , Zhiru Zhang

Top-KAST: Top-K Always Sparse Training

Sparse neural networks are becoming increasingly important as the field seeks to improve the performance of existing models by scaling them up, while simultaneously trying to reduce power consumption and computational footprint.…

Machine Learning · Computer Science 2021-06-08 Siddhant M. Jayakumar , Razvan Pascanu , Jack W. Rae , Simon Osindero , Erich Elsen

Dynamic Sparsity Is Channel-Level Sparsity Learner

Sparse training has received an upsurging interest in machine learning due to its tantalizing saving potential for the entire training process as well as inference. Dynamic sparse training (DST), as a leading sparse training approach, can…

Machine Learning · Computer Science 2023-11-13 Lu Yin , Gen Li , Meng Fang , Li Shen , Tianjin Huang , Zhangyang Wang , Vlado Menkovski , Xiaolong Ma , Mykola Pechenizkiy , Shiwei Liu

Training Neural Networks with Fixed Sparse Masks

During typical gradient-based training of deep neural networks, all of the model's parameters are updated at each iteration. Recent work has shown that it is possible to update only a small subset of the model's parameters during training,…

Machine Learning · Computer Science 2021-11-19 Yi-Lin Sung , Varun Nair , Colin Raffel

Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity

Network pruning can reduce the high computation cost of deep neural network (DNN) models. However, to maintain their accuracies, sparse models often carry randomly-distributed weights, leading to irregular computations. Consequently, sparse…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-01 Cong Guo , Bo Yang Hsueh , Jingwen Leng , Yuxian Qiu , Yue Guan , Zehuan Wang , Xiaoying Jia , Xipeng Li , Minyi Guo , Yuhao Zhu

Predefined Sparseness in Recurrent Sequence Models

Inducing sparseness while training neural networks has been shown to yield models with a lower memory footprint but similar effectiveness to dense models. However, sparseness is typically induced starting from a dense model, and thus this…

Machine Learning · Computer Science 2022-03-30 Thomas Demeester , Johannes Deleu , Fréderic Godin , Chris Develder

Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee

Sparse deep learning aims to address the challenge of huge storage consumption by deep neural networks, and to recover the sparse structure of target functions. Although tremendous empirical successes have been achieved, most sparse deep…

Machine Learning · Statistics 2020-11-17 Jincheng Bai , Qifan Song , Guang Cheng

Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators

Tensor accelerators have gained popularity because they provide a cheap and efficient solution for speeding up computational-expensive tasks in Deep Learning and, more recently, in other Scientific Computing applications. However, since…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-15 Paolo Sylos Labini , Massimo Bernaschi , Francesco Silvestri , Flavio Vella

Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity

Training neural network models with discrete (categorical or structured) latent variables can be computationally challenging, due to the need for marginalization over large or combinatorial sets. To circumvent this issue, one typically…

Machine Learning · Computer Science 2020-12-29 Gonçalo M. Correia , Vlad Niculae , Wilker Aziz , André F. T. Martins

Fast multiplication of random dense matrices with fixed sparse matrices

This work focuses on accelerating the multiplication of a dense random matrix with a (fixed) sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme that takes advantage of blocking and recomputation…

Computational Engineering, Finance, and Science · Computer Science 2024-05-14 Tianyu Liang , Riley Murray , Aydın Buluç , James Demmel

Exploiting Subgradient Sparsity in Max-Plus Neural Networks

Deep Neural Networks are powerful tools for solving machine learning problems, but their training often involves dense and costly parameter updates. In this work, we use a novel Max-Plus neural architecture in which classical addition and…

Machine Learning · Statistics 2026-03-05 Ikhlas Enaieh , Olivier Fercoq

Batch Active Learning from the Perspective of Sparse Approximation

Active learning enables efficient model training by leveraging interactions between machine learning agents and human annotators. We study and propose a novel framework that formulates batch active learning from the sparse approximation's…

Machine Learning · Computer Science 2022-11-08 Maohao Shen , Bowen Jiang , Jacky Yibo Zhang , Oluwasanmi Koyejo

Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers

We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly find the optimal network parameters and sparse network structure in a unified optimization process with trainable pruning thresholds. These…

Machine Learning · Computer Science 2020-05-15 Junjie Liu , Zhe Xu , Runbin Shi , Ray C. C. Cheung , Hayden K. H. So