Related papers: Generative Adversarial Simulator
Knowledge distillation is the technique of compressing a larger neural network, known as the teacher, into a smaller neural network, known as the student, while still trying to maintain the performance of the larger neural network as much…
Knowledge Distillation (KD) has made remarkable progress in the last few years and become a popular paradigm for model compression and knowledge transfer. However, almost all existing KD algorithms are data-driven, i.e., relying on a large…
Knowledge distillation is used, in generative language modeling, to train a smaller student model using the help of a larger teacher model, resulting in improved capabilities for the student model. In this paper, we formulate a more general…
Knowledge distillation refers to a technique of transferring the knowledge from a large learned model or an ensemble of learned models to a small model. This method relies on access to the original training set, which might not always be…
Knowledge distillation aims to compress a powerful yet cumbersome teacher model into a lightweight student model without much sacrifice of performance. For this purpose, various approaches have been proposed over the past few years,…
Knowledge distillation has been widely used to produce portable and efficient neural networks which can be well applied on edge devices for computer vision tasks. However, almost all top-performing knowledge distillation methods need to…
Data-free knowledge distillation aims to learn a compact student network from a pre-trained large teacher network without using the original training data of the teacher network. Existing collection-based and generation-based methods train…
Rehearsal-based video incremental learning often employs knowledge distillation to mitigate catastrophic forgetting of previously learned data. However, this method faces two major challenges for video task: substantial computing resources…
In natural language processing (NLP) tasks, slow inference speed and huge footprints in GPU usage remain the bottleneck of applying pre-trained deep models in production. As a popular method for model compression, knowledge distillation…
Knowledge distillation is one of the most popular and effective techniques for knowledge transfer, model compression and semi-supervised learning. Most existing distillation approaches require the access to original or augmented training…
Policy distillation, which transfers a teacher policy to a student policy has achieved great success in challenging tasks of deep reinforcement learning. This teacher-student framework requires a well-trained teacher model which is…
Knowledge distillation addresses the problem of transferring knowledge from a teacher model to a student model. In this process, we typically have multiple types of knowledge extracted from the teacher model. The problem is to make full use…
Knowledge distillation deals with the problem of training a smaller model (Student) from a high capacity source model (Teacher) so as to retain most of its performance. Existing approaches use either the training data or meta-data extracted…
Network quantization is an essential procedure in deep learning for development of efficient fixed-point inference models on mobile or edge platforms. However, as datasets grow larger and privacy regulations become stricter, data sharing…
Policy distillation in deep reinforcement learning provides an effective way to transfer control policies from a larger network to a smaller untrained network without a significant degradation in performance. However, policy distillation is…
Recent advances in deep learning has lead to rapid developments in the field of image retrieval. However, the best performing architectures incur significant computational cost. Recent approaches tackle this issue using knowledge…
We propose the task of knowledge distillation detection, which aims to determine whether a student model has been distilled from a given teacher, under a practical setting where only the student's weights and the teacher's API are…
Pre-trained language models have become an integral component of question-answering systems, achieving remarkable performance. However, for practical deployment, it is crucial to perform knowledge distillation to maintain high performance…
Knowledge distillation is normally used to compress a big network, or teacher, onto a smaller one, the student, by training it to match its outputs. Recently, some works have shown that robustness against adversarial attacks can also be…
Convolutional neural networks (CNNs) excel in computer vision but are susceptible to adversarial attacks, crafted perturbations designed to mislead predictions. Despite advances in adversarial training, a gap persists between model accuracy…