Related papers: An Efficient Training Algorithm for Models with Bl…

Computation on Sparse Neural Networks: an Inspiration for Future Hardware

Neural network models are widely used in solving many challenging problems, such as computer vision, personalized recommendation, and natural language processing. Those models are very computationally intensive and reach the hardware limit…

Machine Learning · Computer Science 2020-04-28 Fei Sun , Minghai Qin , Tianyun Zhang , Liu Liu , Yen-Kuang Chen , Yuan Xie

Truly Sparse Neural Networks at Scale

Recently, sparse training methods have started to be established as a de facto approach for training and inference efficiency in artificial neural networks. Yet, this efficiency is just in theory. In practice, everyone uses a binary mask to…

Machine Learning · Computer Science 2022-07-13 Selima Curci , Decebal Constantin Mocanu , Mykola Pechenizkiyi

Learning Large Scale Sparse Models

In this work, we consider learning sparse models in large scale settings, where the number of samples and the feature dimension can grow as large as millions or billions. Two immediate issues occur under such challenging scenario: (i)…

Machine Learning · Statistics 2023-01-31 Atul Dhingra , Jie Shen , Nicholas Kleene

Accurate Neural Network Pruning Requires Rethinking Sparse Optimization

Obtaining versions of deep neural networks that are both highly-accurate and highly-sparse is one of the main challenges in the area of model compression, and several high-performance pruning techniques have been investigated by the…

Machine Learning · Computer Science 2023-09-11 Denis Kuznedelev , Eldar Kurtic , Eugenia Iofinova , Elias Frantar , Alexandra Peste , Dan Alistarh

Sparse Networks from Scratch: Faster Training without Losing Performance

We demonstrate the possibility of what we call sparse learning: accelerated training of deep neural networks that maintain sparse weights throughout training while achieving dense performance levels. We accomplish this by developing sparse…

Machine Learning · Computer Science 2019-08-27 Tim Dettmers , Luke Zettlemoyer

Block-wise Dynamic Sparseness

Neural networks have achieved state of the art performance across a wide variety of machine learning tasks, often with large and computation-heavy models. Inducing sparseness as a way to reduce the memory and computation footprint of these…

Machine Learning · Computer Science 2022-10-21 Amir Hadifar , Johannes Deleu , Chris Develder , Thomas Demeester

Efficient In-Memory Acceleration of Sparse Block Diagonal LLMs

Structured sparsity enables deploying large language models (LLMs) on resource-constrained systems. Approaches like dense-to-sparse fine-tuning are particularly compelling, achieving remarkable structured sparsity by reducing the model size…

Hardware Architecture · Computer Science 2025-10-14 João Paulo Cardoso de Lima , Marc Dietrich , Jeronimo Castrillon , Asif Ali Khan

Block-Sparse Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are used in state-of-the-art models in domains such as speech recognition, machine translation, and language modelling. Sparsity is a technique to reduce compute and memory requirements of deep learning…

Machine Learning · Computer Science 2017-11-09 Sharan Narang , Eric Undersander , Gregory Diamos

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

Recently, a new trend of exploring sparsity for accelerating neural network training has emerged, embracing the paradigm of training on the edge. This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for…

Machine Learning · Computer Science 2021-10-28 Geng Yuan , Xiaolong Ma , Wei Niu , Zhengang Li , Zhenglun Kong , Ning Liu , Yifan Gong , Zheng Zhan , Chaoyang He , Qing Jin , Siyue Wang , Minghai Qin , Bin Ren , Yanzhi Wang , Sijia Liu , Xue Lin

Training Sparse Neural Networks

Deep neural networks with lots of parameters are typically used for large-scale computer vision tasks such as image classification. This is a result of using dense matrix multiplications and convolutions. However, sparse computations are…

Computer Vision and Pattern Recognition · Computer Science 2016-11-22 Suraj Srinivas , Akshayvarun Subramanya , R. Venkatesh Babu

Navigating Extremes: Dynamic Sparsity in Large Output Spaces

In recent years, Dynamic Sparse Training (DST) has emerged as an alternative to post-training pruning for generating efficient models. In principle, DST allows for a more memory efficient training process, as it maintains sparsity…

Machine Learning · Computer Science 2025-02-11 Nasib Ullah , Erik Schultheis , Mike Lasby , Yani Ioannou , Rohit Babbar

Sparse Learning for Variable Selection with Structures and Nonlinearities

In this thesis we discuss machine learning methods performing automated variable selection for learning sparse predictive models. There are multiple reasons for promoting sparsity in the predictive models. By relying on a limited set of…

Machine Learning · Computer Science 2019-03-27 Magda Gregorova

Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking

Fueled by their remarkable ability to tackle diverse tasks across multiple domains, large language models (LLMs) have grown at an unprecedented rate, with some recent models containing trillions of parameters. This growth is accompanied by…

Machine Learning · Computer Science 2025-05-30 Athanasios Glentis , Jiaxiang Li , Qiulin Shang , Andi Han , Ioannis Tsaknakis , Quan Wei , Mingyi Hong

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models

Mixture-of-Experts (MoE) language models can reduce computational costs by 2-4$\times$ compared to dense models without sacrificing performance, making them more efficient in computation-bounded scenarios. However, MoE models generally…

Machine Learning · Computer Science 2024-04-09 Bowen Pan , Yikang Shen , Haokun Liu , Mayank Mishra , Gaoyuan Zhang , Aude Oliva , Colin Raffel , Rameswar Panda

BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently…

Machine Learning · Computer Science 2024-04-29 Raphael Ruschel , A. S. M. Iftekhar , B. S. Manjunath , Suya You

Train Less, Infer Faster: Efficient Model Finetuning and Compression via Structured Sparsity

Fully finetuning foundation language models (LMs) with billions of parameters is often impractical due to high computational costs, memory requirements, and the risk of overfitting. Although methods like low-rank adapters help address these…

Machine Learning · Computer Science 2026-02-11 Jonathan Svirsky , Yehonathan Refael , Ofir Lindenbaum

Search Spaces for Neural Model Training

While larger neural models are pushing the boundaries of what deep learning can do, often more weights are needed to train models rather than to run inference for tasks. This paper seeks to understand this behavior using search spaces --…

Machine Learning · Computer Science 2021-05-28 Darko Stosic , Dusan Stosic

Sparse-to-Sparse Training of Diffusion Models

Diffusion models (DMs) are a powerful type of generative models that have achieved state-of-the-art results in various image synthesis tasks and have shown potential in other domains, such as natural language processing and temporal data…

Machine Learning · Computer Science 2026-02-05 Inês Cardoso Oliveira , Decebal Constantin Mocanu , Luis A. Leiva

Dive into Big Model Training

The increasing scale of model size and continuous improvement of performance herald the arrival of the Big Model era. In this report, we explore what and how the big model training works by diving into training objectives and training…

Machine Learning · Computer Science 2022-07-26 Qinghua Liu , Yuxiang Jiang

Weight Block Sparsity: Training, Compilation, and AI Engine Accelerators

Nowadays, increasingly larger Deep Neural Networks (DNNs) are being developed, trained, and utilized. These networks require significant computational resources, putting a strain on both advanced and limited devices. Our solution is to…

Machine Learning · Computer Science 2024-07-16 Paolo D'Alberto , Taehee Jeong , Akshai Jain , Shreyas Manjunath , Mrinal Sarmah , Samuel Hsu , Yaswanth Raparti , Nitesh Pipralia