Related papers: Regularizing Class-wise Predictions via Self-knowl…

Embracing the Dark Knowledge: Domain Generalization Using Regularized Knowledge Distillation

Though convolutional neural networks are widely used in different tasks, lack of generalization capability in the absence of sufficient and representative data is one of the challenges that hinder their practical application. In this paper,…

Computer Vision and Pattern Recognition · Computer Science 2021-07-07 Yufei Wang , Haoliang Li , Lap-pui Chau , Alex C. Kot

Self-Knowledge Distillation with Progressive Refinement of Targets

The generalization capability of deep neural networks has been substantially improved by applying a wide spectrum of regularization methods, e.g., restricting function space, injecting randomness during training, augmenting data, etc. In…

Machine Learning · Computer Science 2021-10-08 Kyungyul Kim , ByeongMoon Ji , Doyoung Yoon , Sangheum Hwang

SRIL: Selective Regularization for Class-Incremental Learning

Human intelligence gradually accepts new information and accumulates knowledge throughout the lifespan. However, deep learning models suffer from a catastrophic forgetting phenomenon, where they forget previous knowledge when acquiring new…

Computer Vision and Pattern Recognition · Computer Science 2023-05-10 Jisu Han , Jaemin Na , Wonjun Hwang

Adaptive Regularization of Labels

Recently, a variety of regularization techniques have been widely applied in deep neural networks, such as dropout, batch normalization, data augmentation, and so on. These methods mainly focus on the regularization of weight parameters to…

Machine Learning · Computer Science 2019-08-16 Qianggang Ding , Sifan Wu , Hao Sun , Jiadong Guo , Shu-Tao Xia

Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation

Knowledge distillation is classically a procedure where a neural network is trained on the output of another network along with the original targets in order to transfer knowledge between the architectures. The special case of…

Machine Learning · Computer Science 2021-10-18 Kenneth Borup , Lars N. Andersen

Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation

Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the applications' boundaries to some accuracy-crucial domains, researchers have been investigating approaches to boost accuracy…

Machine Learning · Computer Science 2019-05-21 Linfeng Zhang , Jiebo Song , Anni Gao , Jingwei Chen , Chenglong Bao , Kaisheng Ma

Self-Knowledge Distillation for Learning Ambiguity

Recent language models have shown remarkable performance on natural language understanding (NLU) tasks. However, they are often sub-optimal when faced with ambiguous samples that can be interpreted in multiple ways, over-confidently…

Computation and Language · Computer Science 2024-06-17 Hancheol Park , Soyeong Jeong , Sukmin Cho , Jong C. Park

Improving Generalization of Metric Learning via Listwise Self-distillation

Most deep metric learning (DML) methods employ a strategy that forces all positive samples to be close in the embedding space while keeping them away from negative ones. However, such a strategy ignores the internal relationships of…

Computer Vision and Pattern Recognition · Computer Science 2022-06-20 Zelong Zeng , Fan Yang , Zheng Wang , Shin'ichi Satoh

Self-Distillation as Instance-Specific Label Smoothing

It has been recently demonstrated that multi-generational self-distillation can improve generalization. Despite this intriguing observation, reasons for the enhancement remain poorly understood. In this paper, we first demonstrate…

Machine Learning · Computer Science 2020-10-23 Zhilu Zhang , Mert R. Sabuncu

Self-Distillation Amplifies Regularization in Hilbert Space

Knowledge distillation introduced in the deep learning context is a method to transfer knowledge from one architecture to another. In particular, when the architectures are identical, this is called self-distillation. The idea is to feed in…

Machine Learning · Computer Science 2020-10-27 Hossein Mobahi , Mehrdad Farajtabar , Peter L. Bartlett

Discriminative Distillation to Reduce Class Confusion in Continual Learning

Successful continual learning of new knowledge would enable intelligent systems to recognize more and more classes of objects. However, current intelligent systems often fail to correctly recognize previously learned classes of objects when…

Computer Vision and Pattern Recognition · Computer Science 2021-08-21 Changhong Zhong , Zhiying Cui , Ruixuan Wang , Wei-Shi Zheng

Does Knowledge Distillation Really Work?

Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks. We show that while knowledge distillation can improve student generalization, it does not…

Machine Learning · Computer Science 2021-12-07 Samuel Stanton , Pavel Izmailov , Polina Kirichenko , Alexander A. Alemi , Andrew Gordon Wilson

Deep Probabilistic Supervision for Image Classification

Supervised training of deep neural networks for classification typically relies on hard targets, which promote overconfidence and can limit calibration, generalization, and robustness. Self-distillation methods aim to mitigate this by…

Computer Vision and Pattern Recognition · Computer Science 2026-02-06 Anton Adelöw , Matteo Gamba , Atsuto Maki

Distilling Visual Priors from Self-Supervised Learning

Convolutional Neural Networks (CNNs) are prone to overfit small training datasets. We present a novel two-phase pipeline that leverages self-supervised learning and knowledge distillation to improve the generalization ability of CNN models…

Computer Vision and Pattern Recognition · Computer Science 2020-08-04 Bingchen Zhao , Xin Wen

AI-KD: Adversarial learning and Implicit regularization for self-Knowledge Distillation

We present a novel adversarial penalized self-knowledge distillation method, named adversarial learning and implicit regularization for self-knowledge distillation (AI-KD), which regularizes the training procedure by adversarial learning…

Computer Vision and Pattern Recognition · Computer Science 2024-03-22 Hyungmin Kim , Sungho Suh , Sunghyun Baek , Daehwan Kim , Daun Jeong , Hansang Cho , Junmo Kim

A New Training Framework for Deep Neural Network

Knowledge distillation is the process of transferring the knowledge from a large model to a small model. In this process, the small model learns the generalization ability of the large model and retains the performance close to that of the…

Machine Learning · Computer Science 2021-03-26 Zhenyan Hou , Wenxuan Fan

Knowledge Distillation as Semiparametric Inference

A popular approach to model compression is to train an inexpensive student model to mimic the class probabilities of a highly accurate but cumbersome teacher model. Surprisingly, this two-step knowledge distillation process often leads to…

Machine Learning · Statistics 2021-04-21 Tri Dao , Govinda M Kamath , Vasilis Syrgkanis , Lester Mackey

Self-Knowledge Distillation in Natural Language Processing

Since deep learning became a key player in natural language processing (NLP), many deep learning models have been showing remarkable performances in a variety of NLP tasks, and in some cases, they are even outperforming humans. Such high…

Computation and Language · Computer Science 2019-08-07 Sangchul Hahn , Heeyoul Choi

Self-Knowledge Distillation via Dropout

To boost the performance, deep neural networks require deeper or wider network structures that involve massive computational and memory costs. To alleviate this issue, the self-knowledge distillation method regularizes the model by…

Computer Vision and Pattern Recognition · Computer Science 2022-08-12 Hyoje Lee , Yeachan Park , Hyun Seo , Myungjoo Kang

Subclass Distillation

After a large "teacher" neural network has been trained on labeled data, the probabilities that the teacher assigns to incorrect classes reveal a lot of information about the way in which the teacher generalizes. By training a small…

Machine Learning · Computer Science 2020-06-12 Rafael Müller , Simon Kornblith , Geoffrey Hinton