Related papers: Deep Classifier Mimicry without Data Access
The emergence of new spoofing attacks poses an increasing challenge to audio security. Current detection methods often falter when faced with unseen spoofing attacks. Traditional strategies, such as retraining with new data, are not always…
Neural networks can learn spurious correlations in the data, often leading to performance degradation for underrepresented subgroups. Studies have demonstrated that the disparity is amplified when knowledge is distilled from a complex…
Large-scale contrastive learning models can learn very informative sentence embeddings, but are hard to serve online due to the huge model size. Therefore, they often play the role of "teacher", transferring abilities to small "student"…
Knowledge Distillation (KD) has made remarkable progress in the last few years and become a popular paradigm for model compression and knowledge transfer. However, almost all existing KD algorithms are data-driven, i.e., relying on a large…
Knowledge distillation has been widely used to produce portable and efficient neural networks which can be well applied on edge devices for computer vision tasks. However, almost all top-performing knowledge distillation methods need to…
The ability to learn from incrementally arriving data is essential for any life-long learning system. However, standard deep neural networks forget the knowledge about the old tasks, a phenomenon called catastrophic forgetting, when trained…
Online Action Detection (OAD) systems face two primary challenges: high computational cost and insufficient modeling of discriminative temporal dynamics against background motion. Adding optical flow could provides strong motion cues but it…
Data-Free Knowledge Distillation (DFKD) enables the knowledge transfer from the given pre-trained teacher network to the target student model without access to the real training data. Existing DFKD methods focus primarily on improving image…
Data-free knowledge distillation~(DFKD) is an effective manner to solve model compression and transmission restrictions while retaining privacy protection, which has attracted extensive attention in recent years. Currently, the majority of…
Model inversion, whose goal is to recover training data from a pre-trained model, has been recently proved feasible. However, existing inversion methods usually suffer from the mode collapse problem, where the synthesized instances are…
Data-Free Knowledge Distillation (KD) allows knowledge transfer from a trained neural network (teacher) to a more compact one (student) in the absence of original training data. Existing works use a validation set to monitor the accuracy of…
To improve the performance of the dual-encoder retriever, one effective approach is knowledge distillation from the cross-encoder ranker. Existing works construct the candidate passages following the supervised learning setting where a…
Low-level database operators often admit multiple physical implementations ("kernels") that are semantically equivalent but have vastly different performance characteristics depending on the input data distribution. Existing database…
In the era of large scale pretrained models, Knowledge Distillation (KD) serves an important role in transferring the wisdom of computationally heavy teacher models to lightweight, efficient student models while preserving performance.…
Knowledge distillation (KD) is one of the prominent techniques for model compression. In this method, the knowledge of a large network (teacher) is distilled into a model (student) with usually significantly fewer parameters. KD tries to…
Large language models (LLMs) excel at processing long sequences, boosting demand for key-value (KV) caching. While recent efforts to evict KV cache have alleviated the inference burden, they often fail to allocate resources rationally…
Data-free knowledge distillation is a challenging model lightweight task for scenarios in which the original dataset is not available. Previous methods require a lot of extra computational costs to update one or more generators and their…
Large-scale language models have recently demonstrated impressive empirical performance. Nevertheless, the improved results are attained at the price of bigger models, more power consumption, and slower inference, which hinder their…
Continual relation extraction (CRE) aims to continuously train a model on data with new relations while avoiding forgetting old ones. Some previous work has proved that storing a few typical samples of old relations and replaying them when…
Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent…