Related papers: Robust Active Distillation

Can a student Large Language Model perform as well as it's teacher?

The burgeoning complexity of contemporary deep learning models, while achieving unparalleled accuracy, has inadvertently introduced deployment challenges in resource-constrained environments. Knowledge distillation, a technique aiming to…

Machine Learning · Computer Science 2023-10-05 Sia Gholami , Marwan Omar

uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes

Recent work on distilling Whisper's knowledge into small models using pseudo-labels shows promising performance while reducing the size by up to 50%. This results in small, efficient, and dedicated models. However, a critical step of…

Computation and Language · Computer Science 2025-05-16 Abdul Waheed , Karima Kadaoui , Bhiksha Raj , Muhammad Abdul-Mageed

Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective

Knowledge distillation is an effective approach to leverage a well-trained network or an ensemble of them, named as the teacher, to guide the training of a student network. The outputs from the teacher network are used as soft labels for…

Machine Learning · Computer Science 2021-02-02 Helong Zhou , Liangchen Song , Jiajie Chen , Ye Zhou , Guoli Wang , Junsong Yuan , Qian Zhang

Weighted Distillation with Unlabeled Examples

Distillation with unlabeled examples is a popular and powerful method for training deep neural networks in settings where the amount of labeled data is limited: A large ''teacher'' neural network is trained on the labeled data available,…

Machine Learning · Computer Science 2022-10-14 Fotis Iliopoulos , Vasilis Kontonis , Cenk Baykal , Gaurav Menghani , Khoa Trinh , Erik Vee

ReffAKD: Resource-efficient Autoencoder-based Knowledge Distillation

In this research, we propose an innovative method to boost Knowledge Distillation efficiency without the need for resource-heavy teacher models. Knowledge Distillation trains a smaller ``student'' model with guidance from a larger…

Machine Learning · Computer Science 2024-04-16 Divyang Doshi , Jung-Eun Kim

Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher

Knowledge distillation is a strategy of training a student network with guide of the soft output from a teacher network. It has been a successful method of model compression and knowledge transfer. However, currently knowledge distillation…

Machine Learning · Computer Science 2024-10-21 Guangda Ji , Zhanxing Zhu

Knowledge Distillation from Internal Representations

Knowledge distillation is typically conducted by training a small model (the student) to mimic a large and cumbersome model (the teacher). The idea is to compress the knowledge from the teacher by using its output probabilities as…

Computation and Language · Computer Science 2020-01-17 Gustavo Aguilar , Yuan Ling , Yu Zhang , Benjamin Yao , Xing Fan , Chenlei Guo

Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation

In this paper, we introduce a novel knowledge distillation approach for the semantic segmentation task. Unlike previous methods that rely on power-trained teachers or other modalities to provide additional knowledge, our approach does not…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Shoumeng Qiu , Jie Chen , Xinrun Li , Ru Wan , Xiangyang Xue , Jian Pu

Explicit and Implicit Knowledge Distillation via Unlabeled Data

Data-free knowledge distillation is a challenging model lightweight task for scenarios in which the original dataset is not available. Previous methods require a lot of extra computational costs to update one or more generators and their…

Computer Vision and Pattern Recognition · Computer Science 2023-02-24 Yuzheng Wang , Zuhao Ge , Zhaoyu Chen , Xian Liu , Chuangjia Ma , Yunquan Sun , Lizhe Qi

Learning From Biased Soft Labels

Knowledge distillation has been widely adopted in a variety of tasks and has achieved remarkable successes. Since its inception, many researchers have been intrigued by the dark knowledge hidden in the outputs of the teacher model.…

Machine Learning · Computer Science 2023-02-17 Hua Yuan , Ning Xu , Yu Shi , Xin Geng , Yong Rui

A Studious Approach to Semi-Supervised Learning

The problem of learning from few labeled examples while using large amounts of unlabeled data has been approached by various semi-supervised methods. Although these methods can achieve superior performance, the models are often not…

Computer Vision and Pattern Recognition · Computer Science 2021-09-21 Sahil Khose , Shruti Jain , V Manushree

Dataset distillation for memorized data: Soft labels can leak held-out teacher knowledge

Dataset distillation aims to compress training data into fewer examples via a teacher, from which a student can learn effectively. While its success is often attributed to structure in the data, modern neural networks also memorize specific…

Machine Learning · Computer Science 2026-02-23 Freya Behrens , Lenka Zdeborová

Progressive Label Distillation: Learning Input-Efficient Deep Neural Networks

Much of the focus in the area of knowledge distillation has been on distilling knowledge from a larger teacher network to a smaller student network. However, there has been little research on how the concept of distillation can be leveraged…

Neural and Evolutionary Computing · Computer Science 2019-01-29 Zhong Qiu Lin , Alexander Wong

Efficient Knowledge Distillation: Empowering Small Language Models with Teacher Model Insights

Enhancing small language models for real-life application deployment is a significant challenge facing the research community. Due to the difficulties and costs of using large language models, researchers are seeking ways to effectively…

Computation and Language · Computer Science 2024-09-20 Mohamad Ballout , Ulf Krumnack , Gunther Heidemann , Kai-Uwe Kühnberger

SLaM: Student-Label Mixing for Distillation with Unlabeled Examples

Knowledge distillation with unlabeled examples is a powerful training paradigm for generating compact and lightweight student models in applications where the amount of labeled data is limited but one has access to a large pool of unlabeled…

Machine Learning · Computer Science 2023-06-12 Vasilis Kontonis , Fotis Iliopoulos , Khoa Trinh , Cenk Baykal , Gaurav Menghani , Erik Vee

A Label is Worth a Thousand Images in Dataset Distillation

Data $\textit{quality}$ is a crucial factor in the performance of machine learning models, a principle that dataset distillation methods exploit by compressing training datasets into much smaller counterparts that maintain similar…

Machine Learning · Computer Science 2025-01-22 Tian Qin , Zhiwei Deng , David Alvarez-Melis

Selective Cross-Task Distillation

The outpouring of various pre-trained models empowers knowledge distillation by providing abundant teacher resources, but there lacks a developed mechanism to utilize these teachers adequately. With a massive model repository composed of…

Machine Learning · Computer Science 2022-09-29 Su Lu , Han-Jia Ye , De-Chuan Zhan

Reinforced Multi-Teacher Selection for Knowledge Distillation

In natural language processing (NLP) tasks, slow inference speed and huge footprints in GPU usage remain the bottleneck of applying pre-trained deep models in production. As a popular method for model compression, knowledge distillation…

Computation and Language · Computer Science 2020-12-15 Fei Yuan , Linjun Shou , Jian Pei , Wutao Lin , Ming Gong , Yan Fu , Daxin Jiang

Parameter-Free Logit Distillation via Sorting Mechanism

Knowledge distillation (KD) aims to distill the knowledge from the teacher (larger) to the student (smaller) model via soft-label for the efficient neural network. In general, the performance of a model is determined by accuracy, which is…

Signal Processing · Electrical Eng. & Systems 2025-08-25 Stephen Ekaputra Limantoro

Improving Neural Topic Models with Wasserstein Knowledge Distillation

Topic modeling is a dominant method for exploring document collections on the web and in digital libraries. Recent approaches to topic modeling use pretrained contextualized language models and variational autoencoders. However, large…

Computation and Language · Computer Science 2024-06-21 Suman Adhya , Debarshi Kumar Sanyal