Related papers: Complementary Relation Contrastive Distillation

Relational Knowledge Distillation

Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output…

Computer Vision and Pattern Recognition · Computer Science 2019-05-02 Wonpyo Park , Dongju Kim , Yan Lu , Minsu Cho

Categorical Relation-Preserving Contrastive Knowledge Distillation for Medical Image Classification

The amount of medical images for training deep classification models is typically very scarce, making these deep models prone to overfit the training data. Studies showed that knowledge distillation (KD), especially the mean-teacher…

Computer Vision and Pattern Recognition · Computer Science 2021-07-08 Xiaohan Xing , Yuenan Hou , Hang Li , Yixuan Yuan , Hongsheng Li , Max Q. -H. Meng

Discriminative and Consistent Representation Distillation

Knowledge Distillation (KD) aims to transfer knowledge from a large teacher model to a smaller student model. While contrastive learning has shown promise in self-supervised learning by creating discriminative representations, its…

Computer Vision and Pattern Recognition · Computer Science 2025-05-14 Nikolaos Giakoumoglou , Tania Stathaki

Contrastive Representation Distillation

Often we wish to transfer representational knowledge from one neural network to another. Examples include distilling a large network into a smaller one, transferring knowledge from one sensory modality to a second, or ensembling a…

Machine Learning · Computer Science 2022-01-26 Yonglong Tian , Dilip Krishnan , Phillip Isola

Cooperative Knowledge Distillation: A Learner Agnostic Approach

Knowledge distillation is a simple but powerful way to transfer knowledge between a teacher model to a student model. Existing work suffers from at least one of the following key limitations in terms of direction and scope of transfer which…

Machine Learning · Computer Science 2024-02-12 Michael Livanos , Ian Davidson , Stephen Wong

Relational Representation Distillation

Knowledge distillation involves transferring knowledge from large, cumbersome teacher models to more compact student models. The standard approach minimizes the Kullback-Leibler (KL) divergence between the probabilistic outputs of a teacher…

Computer Vision and Pattern Recognition · Computer Science 2025-05-14 Nikolaos Giakoumoglou , Tania Stathaki

Continual Collaborative Distillation for Recommender System

Knowledge distillation (KD) has emerged as a promising technique for addressing the computational challenges associated with deploying large-scale recommender systems. KD transfers the knowledge of a massive teacher system to a compact…

Information Retrieval · Computer Science 2024-06-27 Gyuseok Lee , SeongKu Kang , Wonbin Kweon , Hwanjo Yu

Group Relative Knowledge Distillation: Learning from Teacher's Relational Inductive Bias

Knowledge distillation typically transfers knowledge from a teacher model to a student model by minimizing differences between their output distributions. However, existing distillation approaches largely focus on mimicking absolute…

Machine Learning · Computer Science 2025-04-30 Chao Li , Changhua Zhou , Jia Chen

Adversarial Contrastive Distillation with Adaptive Denoising

Adversarial Robustness Distillation (ARD) is a novel method to boost the robustness of small models. Unlike general adversarial training, its robust knowledge transfer can be less easily restricted by the model capacity. However, the…

Computer Vision and Pattern Recognition · Computer Science 2023-02-24 Yuzheng Wang , Zhaoyu Chen , Dingkang Yang , Yang Liu , Siao Liu , Wenqiang Zhang , Lizhe Qi

CORSD: Class-Oriented Relational Self Distillation

Knowledge distillation conducts an effective model compression method while holding some limitations:(1) the feature based distillation methods only focus on distilling the feature map but are lack of transferring the relation of data…

Computer Vision and Pattern Recognition · Computer Science 2023-05-02 Muzhou Yu , Sia Huat Tan , Kailu Wu , Runpei Dong , Linfeng Zhang , Kaisheng Ma

Knowledge Condensation Distillation

Knowledge Distillation (KD) transfers the knowledge from a high-capacity teacher network to strengthen a smaller student. Existing methods focus on excavating the knowledge hints and transferring the whole knowledge to the student. However,…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Chenxin Li , Mingbao Lin , Zhiyuan Ding , Nie Lin , Yihong Zhuang , Yue Huang , Xinghao Ding , Liujuan Cao

Wasserstein Contrastive Representation Distillation

The primary goal of knowledge distillation (KD) is to encapsulate the information of a model learned from a teacher network into a student network, with the latter being more compact than the former. Existing work, e.g., using…

Machine Learning · Computer Science 2021-03-30 Liqun Chen , Dong Wang , Zhe Gan , Jingjing Liu , Ricardo Henao , Lawrence Carin

Knowledge Distillation via Token-level Relationship Graph

Knowledge distillation is a powerful technique for transferring knowledge from a pre-trained teacher model to a student model. However, the true potential of knowledge transfer has not been fully explored. Existing approaches primarily…

Machine Learning · Computer Science 2023-06-23 Shuoxi Zhang , Hanpeng Liu , Kun He

Contrastive Representation Distillation via Multi-Scale Feature Decoupling

Knowledge distillation enhances the performance of compact student networks by transferring knowledge from more powerful teacher networks without introducing additional parameters. In the feature space, local regions within an individual…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Cuipeng Wang , Haipeng Wang

Improved Knowledge Distillation via Adversarial Collaboration

Knowledge distillation has become an important approach to obtain a compact yet effective model. To achieve this goal, a small student model is trained to exploit the knowledge of a large well-trained teacher model. However, due to the…

Computer Vision and Pattern Recognition · Computer Science 2021-11-30 Zhiqiang Liu , Chengkai Huang , Yanxia Liu

Collaborative Distillation for Top-N Recommendation

Knowledge distillation (KD) is a well-known method to reduce inference latency by compressing a cumbersome teacher model to a small student model. Despite the success of KD in the classification task, applying KD to recommender models is…

Machine Learning · Computer Science 2019-11-14 Jae-woong Lee , Minjin Choi , Jongwuk Lee , Hyunjung Shim

CLIP-RD: Relative Distillation for Efficient CLIP Knowledge Distillation

CLIP aligns image and text embeddings via contrastive learning and demonstrates strong zero-shot generalization. Its large-scale architecture requires substantial computational and memory resources, motivating the distillation of its…

Computer Vision and Pattern Recognition · Computer Science 2026-04-23 Jeannie Chung , Hanna Jang , Ingyeong Yang , Uiwon Hwang , Jaehyeong Sim

BicKD: Bilateral Contrastive Knowledge Distillation

Knowledge distillation (KD) is a machine learning framework that transfers knowledge from a teacher model to a student model. The vanilla KD proposed by Hinton et al. has been the dominant approach in logit-based distillation and…

Machine Learning · Computer Science 2026-05-01 Jiangnan Zhu , Yukai Xu , Li Xiong , Yixuan Liu , Junxu Liu , Hong kyu Lee , Yujie Gu

DistilCSE: Effective Knowledge Distillation For Contrastive Sentence Embeddings

Large-scale contrastive learning models can learn very informative sentence embeddings, but are hard to serve online due to the huge model size. Therefore, they often play the role of "teacher", transferring abilities to small "student"…

Artificial Intelligence · Computer Science 2023-01-31 Chaochen Gao , Xing Wu , Peng Wang , Jue Wang , Liangjun Zang , Zhongyuan Wang , Songlin Hu

Correlation Congruence for Knowledge Distillation

Most teacher-student frameworks based on knowledge distillation (KD) depend on a strong congruent constraint on instance level. However, they usually ignore the correlation between multiple instances, which is also valuable for knowledge…

Computer Vision and Pattern Recognition · Computer Science 2019-04-04 Baoyun Peng , Xiao Jin , Jiaheng Liu , Shunfeng Zhou , Yichao Wu , Yu Liu , Dongsheng Li , Zhaoning Zhang