Related papers: Few Sample Knowledge Distillation for Efficient Ne…

Distillation from heterogeneous unlabeled collections

Compressing deep networks is essential to expand their range of applications to constrained settings. The need for compression however often arises long after the model was trained, when the original data might no longer be available. On…

Machine Learning · Computer Science 2022-01-19 Jean-Michel Begon , Pierre Geurts

Distilling with Performance Enhanced Students

The task of accelerating large neural networks on general purpose hardware has, in recent years, prompted the use of channel pruning to reduce network size. However, the efficacy of pruning based approaches has since been called into…

Machine Learning · Statistics 2019-03-08 Jack Turner , Elliot J. Crowley , Valentin Radu , José Cano , Amos Storkey , Michael O'Boyle

Few Shot Network Compression via Cross Distillation

Model compression has been widely adopted to obtain light-weighted deep neural networks. Most prevalent methods, however, require fine-tuning with sufficient training data to ensure accuracy, which could be challenged by privacy and…

Machine Learning · Computer Science 2020-05-05 Haoli Bai , Jiaxiang Wu , Irwin King , Michael Lyu

An Efficient Method of Training Small Models for Regression Problems with Knowledge Distillation

Compressing deep neural network (DNN) models becomes a very important and necessary technique for real-world applications, such as deploying those models on mobile devices. Knowledge distillation is one of the most popular methods for model…

Machine Learning · Computer Science 2020-03-02 Makoto Takamoto , Yusuke Morishita , Hitoshi Imaoka

Data-Free Knowledge Distillation for Deep Neural Networks

Recent advances in model compression have provided procedures for compressing large neural networks to a fraction of their original size while retaining most if not all of their accuracy. However, all of these approaches rely on access to…

Machine Learning · Computer Science 2017-11-27 Raphael Gontijo Lopes , Stefano Fenu , Thad Starner

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Knowledge distillation is a strategy of training a student network with guide of the soft output from a teacher network. It has been a successful method of model compression and knowledge transfer. However, currently knowledge distillation…

Machine Learning · Computer Science 2024-10-21 Guangda Ji , Zhanxing Zhu

Recurrent knowledge distillation

Knowledge distillation compacts deep networks by letting a small student network learn from a large teacher network. The accuracy of knowledge distillation recently benefited from adding residual layers. We propose to reduce the size of the…

Computer Vision and Pattern Recognition · Computer Science 2018-05-21 Silvia L. Pintea , Yue Liu , Jan C. van Gemert

Online Ensemble Model Compression using Knowledge Distillation

This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Devesh Walawalkar , Zhiqiang Shen , Marios Savvides

Progressive Label Distillation: Learning Input-Efficient Deep Neural Networks

Much of the focus in the area of knowledge distillation has been on distilling knowledge from a larger teacher network to a smaller student network. However, there has been little research on how the concept of distillation can be leveraged…

Neural and Evolutionary Computing · Computer Science 2019-01-29 Zhong Qiu Lin , Alexander Wong

Distilling the Knowledge in Data Pruning

With the increasing size of datasets used for training neural networks, data pruning becomes an attractive field of research. However, most current data pruning algorithms are limited in their ability to preserve accuracy compared to models…

Computer Vision and Pattern Recognition · Computer Science 2024-08-15 Emanuel Ben-Baruch , Adam Botach , Igor Kviatkovsky , Manoj Aggarwal , Gérard Medioni

Progressive Network Grafting for Few-Shot Knowledge Distillation

Knowledge distillation has demonstrated encouraging performances in deep model compression. Most existing approaches, however, require massive labeled data to accomplish the knowledge transfer, making the model compression a cumbersome and…

Computer Vision and Pattern Recognition · Computer Science 2020-12-14 Chengchao Shen , Xinchao Wang , Youtan Yin , Jie Song , Sihui Luo , Mingli Song

Training convolutional neural networks with cheap convolutions and online distillation

The large memory and computation consumption in convolutional neural networks (CNNs) has been one of the main barriers for deploying them on resource-limited systems. To this end, most cheap convolutions (e.g., group convolution, depth-wise…

Computer Vision and Pattern Recognition · Computer Science 2019-10-11 Jiao Xie , Shaohui Lin , Yichen Zhang , Linkai Luo

Convex Distillation: Efficient Compression of Deep Networks via Convex Optimization

Deploying large and complex deep neural networks on resource-constrained edge devices poses significant challenges due to their computational demands and the complexities of non-convex optimization. Traditional compression methods such as…

Machine Learning · Computer Science 2024-10-10 Prateek Varshney , Mert Pilanci

Compact CNN Structure Learning by Knowledge Distillation

The concept of compressing deep Convolutional Neural Networks (CNNs) is essential to use limited computation, power, and memory resources on embedded devices. However, existing methods achieve this objective at the cost of a drop in…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Waqar Ahmed , Andrea Zunino , Pietro Morerio , Vittorio Murino

Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy

Deep learning networks have achieved state-of-the-art accuracies on computer vision workloads like image classification and object detection. The performant systems, however, typically involve big models with numerous parameters. Once…

Machine Learning · Computer Science 2017-11-17 Asit Mishra , Debbie Marr

Data-Free Knowledge Distillation with Soft Targeted Transfer Set Synthesis

Knowledge distillation (KD) has proved to be an effective approach for deep neural network compression, which learns a compact network (student) by transferring the knowledge from a pre-trained, over-parameterized network (teacher). In…

Machine Learning · Computer Science 2021-04-13 Zi Wang

Synthetic data generation method for data-free knowledge distillation in regression neural networks

Knowledge distillation is the technique of compressing a larger neural network, known as the teacher, into a smaller neural network, known as the student, while still trying to maintain the performance of the larger neural network as much…

Machine Learning · Computer Science 2023-05-11 Tianxun Zhou , Keng-Hwee Chiam

Beyond Student: An Asymmetric Network for Neural Network Inheritance

Knowledge Distillation (KD) has emerged as a powerful technique for model compression, enabling lightweight student networks to benefit from the performance of redundant teacher networks. However, the inherent capacity gap often limits the…

Machine Learning · Computer Science 2026-02-12 Yiyun Zhou , Jingwei Shi , Mingjing Xu , Zhonghua Jiang , Jingyuan Chen

Knowledge Distillation with the Reused Teacher Classifier

Knowledge distillation aims to compress a powerful yet cumbersome teacher model into a lightweight student model without much sacrifice of performance. For this purpose, various approaches have been proposed over the past few years,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Defang Chen , Jian-Ping Mei , Hailin Zhang , Can Wang , Yan Feng , Chun Chen

Densely Guided Knowledge Distillation using Multiple Teacher Assistants

With the success of deep neural networks, knowledge distillation which guides the learning of a small student network from a large teacher network is being actively studied for model compression and transfer learning. However, few studies…

Computer Vision and Pattern Recognition · Computer Science 2021-08-10 Wonchul Son , Jaemin Na , Junyong Choi , Wonjun Hwang