Related papers: Precise Knowledge Transfer via Flow Matching

Harmonizing knowledge Transfer in Neural Network with Unified Distillation

Knowledge distillation (KD), known for its ability to transfer knowledge from a cumbersome network (teacher) to a lightweight one (student) without altering the architecture, has been garnering increasing attention. Two primary categories…

Computer Vision and Pattern Recognition · Computer Science 2024-09-30 Yaomin Huang , Zaomin Yan , Chaomin Shen , Faming Fang , Guixu Zhang

KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language Models

Knowledge distillation (KD) is an essential technique to compress large language models (LLMs) into smaller ones. However, despite the distinct roles of the student model and the teacher model in KD, most existing frameworks still use a…

Computation and Language · Computer Science 2026-03-25 Songming Zhang , Xue Zhang , Tong Zhang , Bojie Hu , Yufeng Chen , Jinan Xu

Multi-level Knowledge Distillation via Knowledge Alignment and Correlation

Knowledge distillation (KD) has become an important technique for model compression and knowledge transfer. In this work, we first perform a comprehensive analysis of the knowledge transferred by different KD methods. We demonstrate that…

Computer Vision and Pattern Recognition · Computer Science 2021-06-07 Fei Ding , Yin Yang , Hongxin Hu , Venkat Krovi , Feng Luo

KD$^{2}$M: A unifying framework for feature knowledge distillation

Knowledge Distillation (KD) seeks to transfer the knowledge of a teacher, towards a student neural net. This process is often done by matching the networks' predictions (i.e., their output), but, recently several works have proposed to…

Machine Learning · Statistics 2025-09-09 Eduardo Fernandes Montesuma

Knowledge Transfer via Dense Cross-Layer Mutual-Distillation

Knowledge Distillation (KD) based methods adopt the one-way Knowledge Transfer (KT) scheme in which training a lower-capacity student network is guided by a pre-trained high-capacity teacher network. Recently, Deep Mutual Learning (DML)…

Computer Vision and Pattern Recognition · Computer Science 2020-08-19 Anbang Yao , Dawei Sun

Heterogeneous Knowledge Distillation using Information Flow Modeling

Knowledge Distillation (KD) methods are capable of transferring the knowledge encoded in a large and complex teacher into a smaller and faster student. Early methods were usually limited to transferring the knowledge only between the last…

Computer Vision and Pattern Recognition · Computer Science 2020-05-05 Nikolaos Passalis , Maria Tzelepi , Anastasios Tefas

Like What You Like: Knowledge Distill via Neuron Selectivity Transfer

Despite deep neural networks have demonstrated extraordinary power in various applications, their superior performances are at expense of high storage and computational costs. Consequently, the acceleration and compression of neural…

Computer Vision and Pattern Recognition · Computer Science 2017-12-20 Zehao Huang , Naiyan Wang

Decouple Non-parametric Knowledge Distillation For End-to-end Speech Translation

Existing techniques often attempt to make knowledge transfer from a powerful machine translation (MT) to speech translation (ST) model with some elaborate techniques, which often requires transcription as extra input during training.…

Computation and Language · Computer Science 2023-04-21 Hao Zhang , Nianwen Si , Yaqi Chen , Wenlin Zhang , Xukui Yang , Dan Qu , Zhen Li

Consistency Flow Matching: Defining Straight Flows with Velocity Consistency

Flow matching (FM) is a general framework for defining probability paths via Ordinary Differential Equations (ODEs) to transform between noise and data samples. Recent approaches attempt to straighten these flow trajectories to generate…

Computer Vision and Pattern Recognition · Computer Science 2024-07-03 Ling Yang , Zixiang Zhang , Zhilong Zhang , Xingchao Liu , Minkai Xu , Wentao Zhang , Chenlin Meng , Stefano Ermon , Bin Cui

Knowledge distillation is a popular paradigm for learning portable neural networks by transferring the knowledge from a large model into a smaller one. Most existing approaches enhance the student model by utilizing the similarity…

Computer Vision and Pattern Recognition · Computer Science 2021-03-19 Haoran Zhao , Kun Gong , Xin Sun , Junyu Dong , Hui Yu

VRM: Knowledge Distillation via Virtual Relation Matching

Knowledge distillation (KD) aims to transfer the knowledge of a more capable yet cumbersome teacher model to a lightweight student model. In recent years, relation-based KD methods have fallen behind, as their instance-matching counterparts…

Computer Vision and Pattern Recognition · Computer Science 2025-08-01 Weijia Zhang , Fei Xie , Weidong Cai , Chao Ma

Integrating Knowledge Distillation Methods: A Sequential Multi-Stage Framework

Knowledge distillation (KD) transfers knowledge from large teacher models to compact student models, enabling efficient deployment on resource constrained devices. While diverse KD methods, including response based, feature based, and…

Machine Learning · Computer Science 2026-01-23 Yinxi Tian , Changwu Huang , Ke Tang , Xin Yao

UHKD: A Unified Framework for Heterogeneous Knowledge Distillation via Frequency-Domain Representations

Knowledge distillation (KD) is an effective model compression technique that transfers knowledge from a high-performance teacher to a lightweight student, reducing computational and storage costs while maintaining competitive accuracy.…

Computer Vision and Pattern Recognition · Computer Science 2025-11-17 Fengming Yu , Haiwei Pan , Kejia Zhang , Jian Guan , Haiying Jiang

Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies

We propose ClassroomKD, a novel multi-mentor knowledge distillation framework inspired by classroom environments to enhance knowledge transfer between the student and multiple mentors with different knowledge levels. Unlike traditional…

Computer Vision and Pattern Recognition · Computer Science 2025-03-18 Shalini Sarode , Muhammad Saif Ullah Khan , Tahira Shehzadi , Didier Stricker , Muhammad Zeshan Afzal

Semi-Online Knowledge Distillation

Knowledge distillation is an effective and stable method for model compression via knowledge transfer. Conventional knowledge distillation (KD) is to transfer knowledge from a large and well pre-trained teacher network to a small student…

Computer Vision and Pattern Recognition · Computer Science 2021-11-24 Zhiqiang Liu , Yanxia Liu , Chengkai Huang

Parameter-Efficient and Student-Friendly Knowledge Distillation

Knowledge distillation (KD) has been extensively employed to transfer the knowledge from a large teacher model to the smaller students, where the parameters of the teacher are fixed (or partially) during training. Recent studies show that…

Machine Learning · Computer Science 2022-06-01 Jun Rao , Xv Meng , Liang Ding , Shuhan Qi , Dacheng Tao

Collaborative Teacher-Student Learning via Multiple Knowledge Transfer

Knowledge distillation (KD), as an efficient and effective model compression technique, has been receiving considerable attention in deep learning. The key to its success is to transfer knowledge from a large teacher network to a small…

Machine Learning · Computer Science 2021-01-28 Liyuan Sun , Jianping Gou , Baosheng Yu , Lan Du , Dacheng Tao

FEED: Feature-level Ensemble for Knowledge Distillation

Knowledge Distillation (KD) aims to transfer knowledge in a teacher-student framework, by providing the predictions of the teacher network to the student network in the training stage to help the student network generalize better. It can…

Computer Vision and Pattern Recognition · Computer Science 2019-09-25 SeongUk Park , Nojun Kwak

FreeKD: Free-direction Knowledge Distillation for Graph Neural Networks

Knowledge distillation (KD) has demonstrated its effectiveness to boost the performance of graph neural networks (GNNs), where its goal is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is actually…

Machine Learning · Computer Science 2023-03-28 Kaituo Feng , Changsheng Li , Ye Yuan , Guoren Wang

Correlation Congruence for Knowledge Distillation

Most teacher-student frameworks based on knowledge distillation (KD) depend on a strong congruent constraint on instance level. However, they usually ignore the correlation between multiple instances, which is also valuable for knowledge…

Computer Vision and Pattern Recognition · Computer Science 2019-04-04 Baoyun Peng , Xiao Jin , Jiaheng Liu , Shunfeng Zhou , Yichao Wu , Yu Liu , Dongsheng Li , Zhaoning Zhang