Related papers: Pruning-Aware Merging for Efficient Multitask Infe…

PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning

This paper presents a method for adding multiple tasks to a single deep neural network while avoiding catastrophic forgetting. Inspired by network pruning techniques, we exploit redundancies in large deep networks to free up parameters that…

Computer Vision and Pattern Recognition · Computer Science 2018-05-15 Arun Mallya , Svetlana Lazebnik

Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training

The subject of green AI has been gaining attention within the deep learning community given the recent trend of ever larger and more complex neural network models. Existing solutions for reducing the computational load of training at…

Machine Learning · Computer Science 2025-01-13 Xiaoying Zhi , Varun Babbar , Rundong Liu , Pheobe Sun , Fran Silavong , Ruibo Shi , Sean Moran

MIME: Adapting a Single Neural Network for Multi-task Inference with Memory-efficient Dynamic Pruning

Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and energy-efficient solutions for inference in a multi-task scenario. We propose an algorithm-hardware co-design approach called MIME. MIME reuses…

Machine Learning · Computer Science 2022-06-22 Abhiroop Bhattacharjee , Yeshwanth Venkatesha , Abhishek Moitra , Priyadarshini Panda

Network Pruning Spaces

Network pruning techniques, including weight pruning and filter pruning, reveal that most state-of-the-art neural networks can be accelerated without a significant performance drop. This work focuses on filter pruning which enables…

Computer Vision and Pattern Recognition · Computer Science 2023-04-20 Xuanyu He , Yu-I Yang , Ran Song , Jiachen Pu , Conggang Hu , Feijun Jiang , Wei Zhang , Huanghao Ding

Mitigating Parameter Interference in Model Merging via Sharpness-Aware Fine-Tuning

Large-scale deep learning models with a pretraining-finetuning paradigm have led to a surge of numerous task-specific models fine-tuned from a common pre-trained model. Recently, several research efforts have been made on merging these…

Machine Learning · Computer Science 2025-04-22 Yeoreum Lee , Jinwook Jung , Sungyong Baik

Partition Pruning: Parallelization-Aware Pruning for Deep Neural Networks

Parameters of recent neural networks require a huge amount of memory. These parameters are used by neural networks to perform machine learning tasks when processing inputs. To speed up inference, we develop Partition Pruning, an innovative…

Computer Vision and Pattern Recognition · Computer Science 2019-02-28 Sina Shahhosseini , Ahmad Albaqsami , Masoomeh Jasemi , Nader Bagherzadeh

Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy

Neural network pruning is a popular technique used to reduce the inference costs of modern, potentially overparameterized, networks. Starting from a pre-trained network, the process is as follows: remove redundant parameters, retrain, and…

Machine Learning · Computer Science 2021-03-05 Lucas Liebenwein , Cenk Baykal , Brandon Carter , David Gifford , Daniela Rus

ATM: Improving Model Merging by Alternating Tuning and Merging

Model merging has emerged as a cost-efficient approximation to multitask learning. Among merging strategies, task arithmetic is notable for its simplicity and effectiveness. In this work, we provide a theoretical motivation for task vectors…

Machine Learning · Computer Science 2025-08-11 Luca Zhou , Daniele Solombrino , Donato Crisostomi , Maria Sofia Bucarelli , Fabrizio Silvestri , Emanuele Rodolà

Pruning Early Exit Networks

Deep learning models that perform well often have high computational costs. In this paper, we combine two approaches that try to reduce the computational cost while keeping the model performance high: pruning and early exit networks. We…

Machine Learning · Computer Science 2022-07-12 Alperen Görmez , Erdem Koyuncu

Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference

Efficient machine learning implementations optimized for inference in hardware have wide-ranging benefits, depending on the application, from lower inference latency to higher data throughput and reduced energy consumption. Two popular…

Machine Learning · Computer Science 2021-07-21 Benjamin Hawks , Javier Duarte , Nicholas J. Fraser , Alessandro Pappalardo , Nhan Tran , Yaman Umuroglu

Accelerator-Aware Pruning for Convolutional Neural Networks

Convolutional neural networks have shown tremendous performance capabilities in computer vision tasks, but their excessive amounts of weight storage and arithmetic operations prevent them from being adopted in embedded environments. One of…

Neural and Evolutionary Computing · Computer Science 2020-09-08 Hyeong-Ju Kang

Deep Virtual Networks for Memory Efficient Inference of Multiple Tasks

Deep networks consume a large amount of memory by their nature. A natural question arises can we reduce that memory requirement whilst maintaining performance. In particular, in this work we address the problem of memory efficient learning…

Computer Vision and Pattern Recognition · Computer Science 2019-04-10 Eunwoo Kim , Chanho Ahn , Philip H. S. Torr , Songhwai Oh

REAM: Merging Improves Pruning of Experts in LLMs

Mixture-of-Experts (MoE) large language models (LLMs) are among the top-performing architectures. The largest models, often with hundreds of billions of parameters, pose significant memory challenges for deployment. Traditional approaches…

Artificial Intelligence · Computer Science 2026-04-07 Saurav Jha , Maryam Hashemzadeh , Ali Saheb Pasand , Ali Parviz , Min-Joong Lee , Boris Knyazev

Pruning by Active Attention Manipulation

Filter pruning of a CNN is typically achieved by applying discrete masks on the CNN's filter weights or activation maps, post-training. Here, we present a new filter-importance-scoring concept named pruning by active attention manipulation…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Zahra Babaiee , Lucas Liebenwein , Ramin Hasani , Daniela Rus , Radu Grosu

Path-Adaptive Matting for Efficient Inference Under Various Computational Cost Constraints

In this paper, we explore a novel image matting task aimed at achieving efficient inference under various computational cost constraints, specifically FLOP limitations, using a single matting network. Existing matting methods which have not…

Computer Vision and Pattern Recognition · Computer Science 2025-03-06 Qinglin Liu , Zonglin Li , Xiaoqian Lv , Xin Sun , Ru Li , Shengping Zhang

Effective Network Compression Using Simulation-Guided Iterative Pruning

Existing high-performance deep learning models require very intensive computing. For this reason, it is difficult to embed a deep learning model into a system with limited resources. In this paper, we propose the novel idea of the network…

Machine Learning · Computer Science 2019-02-13 Dae-Woong Jeong , Jaehun Kim , Youngseok Kim , Tae-Ho Kim , Myungsu Chae

Manifold Regularized Dynamic Network Pruning

Neural network pruning is an essential approach for reducing the computational complexity of deep models so that they can be well deployed on resource-limited devices. Compared with conventional methods, the recently developed dynamic…

Computer Vision and Pattern Recognition · Computer Science 2021-03-11 Yehui Tang , Yunhe Wang , Yixing Xu , Yiping Deng , Chao Xu , Dacheng Tao , Chang Xu

Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery

We propose a novel algorithm for combined unit and layer pruning of deep neural networks that functions during training and without requiring a pre-trained network to apply. Our algorithm optimally trades-off learning accuracy and pruning…

Machine Learning · Computer Science 2025-07-17 Valentin Frank Ingmar Guenter , Athanasios Sideris

Neuron Merging: Compensating for Pruned Neurons

Network pruning is widely used to lighten and accelerate neural network models. Structured network pruning discards the whole neuron or filter, leading to accuracy loss. In this work, we propose a novel concept of neuron merging applicable…

Computer Vision and Pattern Recognition · Computer Science 2020-10-27 Woojeong Kim , Suhyun Kim , Mincheol Park , Geonseok Jeon

Rethinking the Value of Network Pruning

Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model), pruning and fine-tuning. During pruning,…

Machine Learning · Computer Science 2019-03-06 Zhuang Liu , Mingjie Sun , Tinghui Zhou , Gao Huang , Trevor Darrell