Related papers: On Compressing U-net Using Knowledge Distillation

Weight Distillation: Transferring the Knowledge in Neural Network Parameters

Knowledge distillation has been proven to be effective in model acceleration and compression. It allows a small network to learn to generalize in the same way as a large network. Recent successes in pre-training suggest the effectiveness of…

Computation and Language · Computer Science 2021-07-20 Ye Lin , Yanyang Li , Ziyang Wang , Bei Li , Quan Du , Tong Xiao , Jingbo Zhu

Online Ensemble Model Compression using Knowledge Distillation

This paper presents a novel knowledge distillation based model compression framework consisting of a student ensemble. It enables distillation of simultaneously learnt ensemble knowledge onto each of the compressed student models. Each…

Computer Vision and Pattern Recognition · Computer Science 2020-11-17 Devesh Walawalkar , Zhiqiang Shen , Marios Savvides

Data-Free Knowledge Distillation for Deep Neural Networks

Recent advances in model compression have provided procedures for compressing large neural networks to a fraction of their original size while retaining most if not all of their accuracy. However, all of these approaches rely on access to…

Machine Learning · Computer Science 2017-11-27 Raphael Gontijo Lopes , Stefano Fenu , Thad Starner

Knowledge Distillation with the Reused Teacher Classifier

Knowledge distillation aims to compress a powerful yet cumbersome teacher model into a lightweight student model without much sacrifice of performance. For this purpose, various approaches have been proposed over the past few years,…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Defang Chen , Jian-Ping Mei , Hailin Zhang , Can Wang , Yan Feng , Chun Chen

Few Sample Knowledge Distillation for Efficient Network Compression

Deep neural network compression techniques such as pruning and weight tensor decomposition usually require fine-tuning to recover the prediction accuracy when the compression ratio is high. However, conventional fine-tuning suffers from the…

Machine Learning · Computer Science 2020-04-01 Tianhong Li , Jianguo Li , Zhuang Liu , Changshui Zhang

A Functional Perspective on Knowledge Distillation in Neural Networks

Knowledge distillation is considered a compression mechanism when judged on the resulting student's accuracy and loss, yet its functional impact is poorly understood. We quantify the compression capacity of knowledge distillation and the…

Machine Learning · Computer Science 2026-03-17 Israel Mason-Williams , Gabryel Mason-Williams , Helen Yannakoudakis

Computation-Efficient Knowledge Distillation via Uncertainty-Aware Mixup

Knowledge distillation, which involves extracting the "dark knowledge" from a teacher network to guide the learning of a student network, has emerged as an essential technique for model compression and transfer learning. Unlike previous…

Computer Vision and Pattern Recognition · Computer Science 2020-12-18 Guodong Xu , Ziwei Liu , Chen Change Loy

Regularizing Class-wise Predictions via Self-knowledge Distillation

Deep neural networks with millions of parameters may suffer from poor generalization due to overfitting. To mitigate the issue, we propose a new regularization method that penalizes the predictive distribution between similar samples. In…

Machine Learning · Computer Science 2020-04-08 Sukmin Yun , Jongjin Park , Kimin Lee , Jinwoo Shin

Data-Efficient Ranking Distillation for Image Retrieval

Recent advances in deep learning has lead to rapid developments in the field of image retrieval. However, the best performing architectures incur significant computational cost. Recent approaches tackle this issue using knowledge…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Zakaria Laskar , Juho Kannala

Distilling the knowledge with quantum neural networks

Quantum Neural Networks (QNNs) are a promising class of quantum machine learning models with potential quantum advantages when implemented on scalable, error-corrected quantum computers. However, as system sizes increase, deploying QNNs…

Quantum Physics · Physics 2026-03-24 Yuxuan Yan , Sitian Qian , Qi Zhao , Xingjian Zhang

Compact CNN Structure Learning by Knowledge Distillation

The concept of compressing deep Convolutional Neural Networks (CNNs) is essential to use limited computation, power, and memory resources on embedded devices. However, existing methods achieve this objective at the cost of a drop in…

Computer Vision and Pattern Recognition · Computer Science 2021-04-20 Waqar Ahmed , Andrea Zunino , Pietro Morerio , Vittorio Murino

Knowledge Distillation: A Survey

In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver…

Machine Learning · Computer Science 2021-05-21 Jianping Gou , Baosheng Yu , Stephen John Maybank , Dacheng Tao

Efficient Learned Image Compression Through Knowledge Distillation

Learned image compression sits at the intersection of machine learning and image processing. With advances in deep learning, neural network-based compression methods have emerged. In this process, an encoder maps the image to a…

Computer Vision and Pattern Recognition · Computer Science 2025-09-15 Fabien Allemand , Attilio Fiandrotti , Sumanta Chaudhuri , Alaa Eddine Mazouz

Knowledge distillation for optimization of quantized deep neural networks

Knowledge distillation (KD) is a very popular method for model size reduction. Recently, the technique is exploited for quantized deep neural networks (QDNNs) training as a way to restore the performance sacrificed by word-length reduction.…

Machine Learning · Computer Science 2019-10-24 Sungho Shin , Yoonho Boo , Wonyong Sung

Be Your Own Best Competitor! Multi-Branched Adversarial Knowledge Transfer

Deep neural network architectures have attained remarkable improvements in scene understanding tasks. Utilizing an efficient model is one of the most important constraints for limited-resource devices. Recently, several compression methods…

Computer Vision and Pattern Recognition · Computer Science 2020-10-12 Mahdi Ghorbani , Fahimeh Fooladgar , Shohreh Kasaei

Training convolutional neural networks with cheap convolutions and online distillation

The large memory and computation consumption in convolutional neural networks (CNNs) has been one of the main barriers for deploying them on resource-limited systems. To this end, most cheap convolutions (e.g., group convolution, depth-wise…

Computer Vision and Pattern Recognition · Computer Science 2019-10-11 Jiao Xie , Shaohui Lin , Yichen Zhang , Linkai Luo

Learning Metrics from Teachers: Compact Networks for Image Embedding

Metric learning networks are used to compute image embeddings, which are widely used in many applications such as image retrieval and face recognition. In this paper, we propose to use network distillation to efficiently compute image…

Computer Vision and Pattern Recognition · Computer Science 2019-04-09 Lu Yu , Vacit Oguz Yazici , Xialei Liu , Joost van de Weijer , Yongmei Cheng , Arnau Ramisa

Data Distillation for Text Classification

Deep learning techniques have achieved great success in many fields, while at the same time deep learning models are getting more complex and expensive to compute. It severely hinders the wide applications of these models. In order to…

Computation and Language · Computer Science 2021-04-20 Yongqi Li , Wenjie Li

Controlling the Quality of Distillation in Response-Based Network Compression

The performance of a distillation-based compressed network is governed by the quality of distillation. The reason for the suboptimal distillation of a large network (teacher) to a smaller network (student) is largely attributed to the gap…

Computer Vision and Pattern Recognition · Computer Science 2021-12-21 Vibhas Vats , David Crandall

Does Knowledge Distillation Really Work?

Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks. We show that while knowledge distillation can improve student generalization, it does not…

Machine Learning · Computer Science 2021-12-07 Samuel Stanton , Pavel Izmailov , Polina Kirichenko , Alexander A. Alemi , Andrew Gordon Wilson