Related papers: Rethinking Centered Kernel Alignment in Knowledge …

Correlation Congruence for Knowledge Distillation

Most teacher-student frameworks based on knowledge distillation (KD) depend on a strong congruent constraint on instance level. However, they usually ignore the correlation between multiple instances, which is also valuable for knowledge…

Computer Vision and Pattern Recognition · Computer Science 2019-04-04 Baoyun Peng , Xiao Jin , Jiaheng Liu , Shunfeng Zhou , Yichao Wu , Yu Liu , Dongsheng Li , Zhaoning Zhang

Reliability of CKA as a Similarity Measure in Deep Learning

Comparing learned neural representations in neural networks is a challenging but important problem, which has been approached in different ways. The Centered Kernel Alignment (CKA) similarity metric, particularly its linear variant, has…

Machine Learning · Computer Science 2022-11-17 MohammadReza Davari , Stefan Horoi , Amine Natik , Guillaume Lajoie , Guy Wolf , Eugene Belilovsky

Manifold Approximation leads to Robust Kernel Alignment

Centered kernel alignment (CKA) is a popular metric for comparing representations, determining equivalence of networks, and neuroscience research. However, CKA does not account for the underlying manifold and relies on numerous heuristics…

Machine Learning · Computer Science 2025-10-28 Mohammad Tariqul Islam , Du Liu , Deblina Sarkar

Knowledge distillation through geometry-aware representational alignment

Knowledge distillation is a common paradigm for transferring capabilities from larger models to smaller ones. While traditional distillation methods leverage a probabilistic divergence over the output of the teacher and student models,…

Machine Learning · Computer Science 2025-10-01 Prajjwal Bhattarai , Mohammad Amjad , Dmytro Zhylko , Tuka Alhanai

Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation

In practical applications of human pose estimation, low-resolution inputs frequently occur, and existing state-of-the-art models perform poorly with low-resolution images. This work focuses on boosting the performance of low-resolution…

Computer Vision and Pattern Recognition · Computer Science 2024-05-21 Zejun Gu , Zhong-Qiu Zhao , Henghui Ding , Hao Shen , Zhao Zhang , De-Shuang Huang

Correcting Biased Centered Kernel Alignment Measures in Biological and Artificial Neural Networks

Centred Kernel Alignment (CKA) has recently emerged as a popular metric to compare activations from biological and artificial neural networks (ANNs) in order to quantify the alignment between internal representations derived from stimuli…

Neurons and Cognition · Quantitative Biology 2024-05-03 Alex Murphy , Joel Zylberberg , Alona Fyshe

One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation

Knowledge distillation~(KD) has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. However, most existing distillation methods are designed under the assumption that the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Zhiwei Hao , Jianyuan Guo , Kai Han , Yehui Tang , Han Hu , Yunhe Wang , Chang Xu

Contrastive Knowledge Amalgamation for Unsupervised Image Classification

Knowledge amalgamation (KA) aims to learn a compact student model to handle the joint objective from multiple teacher models that are are specialized for their own tasks respectively. Current methods focus on coarsely aligning teachers and…

Computer Vision and Pattern Recognition · Computer Science 2023-07-28 Shangde Gao , Yichao Fu , Ke Liu , Yuqiang Han

SRA: Span Representation Alignment for Large Language Model Distillation

Cross-Tokenizer Knowledge Distillation (CTKD) enables knowledge transfer between a large language model and a smaller student, even when they employ different tokenizers. While existing approaches mainly focus on token-level alignment…

Computation and Language · Computer Science 2026-05-05 Quoc Phong Dao , Hoang Son Nguyen , Pham Khanh Chi , Tung Nguyen , Linh Ngo Van , Nguyen Thi Ngoc Diep , Trung Le

Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models

Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various…

Computation and Language · Computer Science 2025-04-21 Junjie Yang , Junhao Song , Xudong Han , Ziqian Bi , Tianyang Wang , Chia Xin Liang , Xinyuan Song , Yichao Zhang , Qian Niu , Benji Peng , Keyu Chen , Ming Liu

Efficient and Robust Knowledge Distillation from A Stronger Teacher Based on Correlation Matching

Knowledge Distillation (KD) has emerged as a pivotal technique for neural network compression and performance enhancement. Most KD methods aim to transfer dark knowledge from a cumbersome teacher model to a lightweight student model based…

Machine Learning · Computer Science 2024-10-10 Wenqi Niu , Yingchao Wang , Guohui Cai , Hanpo Hou

Improved Knowledge Distillation via Full Kernel Matrix Transfer

Knowledge distillation is an effective way for model compression in deep learning. Given a large model (i.e., teacher model), it aims to improve the performance of a compact model (i.e., student model) by transferring the information from…

Machine Learning · Computer Science 2022-03-31 Qi Qian , Hao Li , Juhua Hu

Discriminative and Consistent Representation Distillation

Knowledge Distillation (KD) aims to transfer knowledge from a large teacher model to a smaller student model. While contrastive learning has shown promise in self-supervised learning by creating discriminative representations, its…

Computer Vision and Pattern Recognition · Computer Science 2025-05-14 Nikolaos Giakoumoglou , Tania Stathaki

Multi-level Knowledge Distillation via Knowledge Alignment and Correlation

Knowledge distillation (KD) has become an important technique for model compression and knowledge transfer. In this work, we first perform a comprehensive analysis of the knowledge transferred by different KD methods. We demonstrate that…

Computer Vision and Pattern Recognition · Computer Science 2021-06-07 Fei Ding , Yin Yang , Hongxin Hu , Venkat Krovi , Feng Luo

Preview-based Category Contrastive Learning for Knowledge Distillation

Knowledge distillation is a mainstream algorithm in model compression by transferring knowledge from the larger model (teacher) to the smaller model (student) to improve the performance of student. Despite many efforts, existing methods…

Computer Vision and Pattern Recognition · Computer Science 2024-10-21 Muhe Ding , Jianlong Wu , Xue Dong , Xiaojie Li , Pengda Qin , Tian Gan , Liqiang Nie

LLaVA-CKD: Bottom-Up Cascaded Knowledge Distillation for Vision-Language Models

Large Vision-Language Models (VLMs) are successful in addressing a multitude of vision-language understanding tasks, such as Visual Question Answering (VQA), but their memory and compute requirements remain a concern for practical…

Computer Vision and Pattern Recognition · Computer Science 2026-05-12 Nikolaos Gkalelis , Vasileios Mezaris

Multi-Scale Aligned Distillation for Low-Resolution Detection

In instance-level detection tasks (e.g., object detection), reducing input resolution is an easy option to improve runtime efficiency. However, this option traditionally hurts the detection performance much. This paper focuses on boosting…

Computer Vision and Pattern Recognition · Computer Science 2021-09-16 Lu Qi , Jason Kuen , Jiuxiang Gu , Zhe Lin , Yi Wang , Yukang Chen , Yanwei Li , Jiaya Jia

Confidence-Aware Multi-Teacher Knowledge Distillation

Knowledge distillation is initially introduced to utilize additional supervision from a single teacher model for the student model training. To boost the student performance, some recent variants attempt to exploit diverse knowledge sources…

Machine Learning · Computer Science 2022-02-15 Hailin Zhang , Defang Chen , Can Wang

Dual-Space Knowledge Distillation with Key-Query Matching for Large Language Models with Vocabulary Mismatch

Large language models (LLMs) achieve state-of-the-art (SOTA) performance across language tasks, but are costly to deploy due to their size and resource demands. Knowledge Distillation (KD) addresses this by training smaller Student models…

Computation and Language · Computer Science 2026-05-19 Stella Eva Tsiapali , Cong-Thanh Do , Kate Knill

Data-Efficient Ranking Distillation for Image Retrieval

Recent advances in deep learning has lead to rapid developments in the field of image retrieval. However, the best performing architectures incur significant computational cost. Recent approaches tackle this issue using knowledge…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Zakaria Laskar , Juho Kannala