Related papers: Deep Classifier Mimicry without Data Access

Advancing Continual Learning for Robust Deepfake Audio Classification

The emergence of new spoofing attacks poses an increasing challenge to audio security. Current detection methods often falter when faced with unseen spoofing attacks. Traditional strategies, such as retraining with new data, are not always…

Audio and Speech Processing · Electrical Eng. & Systems 2024-07-16 Feiyi Dong , Qingchen Tang , Yichen Bai , Zihan Wang

Adaptive Group Robust Ensemble Knowledge Distillation

Neural networks can learn spurious correlations in the data, often leading to performance degradation for underrepresented subgroups. Studies have demonstrated that the disparity is amplified when knowledge is distilled from a complex…

Machine Learning · Computer Science 2025-11-11 Patrik Kenfack , Ulrich Aïvodji , Samira Ebrahimi Kahou

DistilCSE: Effective Knowledge Distillation For Contrastive Sentence Embeddings

Large-scale contrastive learning models can learn very informative sentence embeddings, but are hard to serve online due to the huge model size. Therefore, they often play the role of "teacher", transferring abilities to small "student"…

Artificial Intelligence · Computer Science 2023-01-31 Chaochen Gao , Xing Wu , Peng Wang , Jue Wang , Liangjun Zang , Zhongyuan Wang , Songlin Hu

Data-Free Adversarial Distillation

Knowledge Distillation (KD) has made remarkable progress in the last few years and become a popular paradigm for model compression and knowledge transfer. However, almost all existing KD algorithms are data-driven, i.e., relying on a large…

Machine Learning · Computer Science 2020-03-03 Gongfan Fang , Jie Song , Chengchao Shen , Xinchao Wang , Da Chen , Mingli Song

Dual Discriminator Adversarial Distillation for Data-free Model Compression

Knowledge distillation has been widely used to produce portable and efficient neural networks which can be well applied on edge devices for computer vision tasks. However, almost all top-performing knowledge distillation methods need to…

Computer Vision and Pattern Recognition · Computer Science 2021-10-06 Haoran Zhao , Xin Sun , Junyu Dong , Hui Yu , Huiyu Zhou

Distillation Techniques for Pseudo-rehearsal Based Incremental Learning

The ability to learn from incrementally arriving data is essential for any life-long learning system. However, standard deep neural networks forget the knowledge about the old tasks, a phenomenon called catastrophic forgetting, when trained…

Computer Vision and Pattern Recognition · Computer Science 2018-07-12 Haseeb Shah , Khurram Javed , Faisal Shafait

CAKE: Real-time Action Detection via Motion Distillation and Background-aware Contrastive Learning

Online Action Detection (OAD) systems face two primary challenges: high computational cost and insufficient modeling of discriminative temporal dynamics against background motion. Adding optical flow could provides strong motion cues but it…

Computer Vision and Pattern Recognition · Computer Science 2026-03-26 Hieu Hoang , Dung Trung Tran , Hong Nguyen , Nam-Phong Nguyen

CAE-DFKD: Bridging the Transferability Gap in Data-Free Knowledge Distillation

Data-Free Knowledge Distillation (DFKD) enables the knowledge transfer from the given pre-trained teacher network to the target student model without access to the real training data. Existing DFKD methods focus primarily on improving image…

Computer Vision and Pattern Recognition · Computer Science 2025-05-01 Zherui Zhang , Changwei Wang , Rongtao Xu , Wenhao Xu , Shibiao Xu , Yu Zhang , Li Guo

Conditional Pseudo-Supervised Contrast for Data-Free Knowledge Distillation

Data-free knowledge distillation~(DFKD) is an effective manner to solve model compression and transmission restrictions while retaining privacy protection, which has attracted extensive attention in recent years. Currently, the majority of…

Machine Learning · Computer Science 2025-10-07 Renrong Shao , Wei Zhang , Jun wang

Contrastive Model Inversion for Data-Free Knowledge Distillation

Model inversion, whose goal is to recover training data from a pre-trained model, has been recently proved feasible. However, existing inversion methods usually suffer from the mode collapse problem, where the synthesized instances are…

Artificial Intelligence · Computer Science 2021-05-19 Gongfan Fang , Jie Song , Xinchao Wang , Chengchao Shen , Xingen Wang , Mingli Song

Robust and Resource-Efficient Data-Free Knowledge Distillation by Generative Pseudo Replay

Data-Free Knowledge Distillation (KD) allows knowledge transfer from a trained neural network (teacher) to a more compact one (student) in the absence of original training data. Existing works use a validation set to monitor the accuracy of…

Machine Learning · Computer Science 2024-07-30 Kuluhan Binici , Shivam Aggarwal , Nam Trung Pham , Karianto Leman , Tulika Mitra

Adam: Dense Retrieval Distillation with Adaptive Dark Examples

To improve the performance of the dual-encoder retriever, one effective approach is knowledge distillation from the cross-encoder ranker. Existing works construct the candidate passages following the supervised learning setting where a…

Computation and Language · Computer Science 2024-06-07 Chongyang Tao , Chang Liu , Tao Shen , Can Xu , Xiubo Geng , Binxing Jiao , Daxin Jiang

Piece of CAKE: Adaptive Execution Engines via Microsecond-Scale Learning

Low-level database operators often admit multiple physical implementations ("kernels") that are semantically equivalent but have vastly different performance characteristics depending on the input data distribution. Existing database…

Databases · Computer Science 2026-02-05 Zijie Zhao , Ryan Marcus

Comparative Knowledge Distillation

In the era of large scale pretrained models, Knowledge Distillation (KD) serves an important role in transferring the wisdom of computationally heavy teacher models to lightweight, efficient student models while preserving performance.…

Machine Learning · Computer Science 2023-11-07 Alex Wilf , Alex Tianyi Xu , Paul Pu Liang , Alexander Obolenskiy , Daniel Fried , Louis-Philippe Morency

Improved knowledge distillation by utilizing backward pass knowledge in neural networks

Knowledge distillation (KD) is one of the prominent techniques for model compression. In this method, the knowledge of a large network (teacher) is distilled into a model (student) with usually significantly fewer parameters. KD tries to…

Machine Learning · Computer Science 2023-01-31 Aref Jafari , Mehdi Rezagholizadeh , Ali Ghodsi

CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences

Large language models (LLMs) excel at processing long sequences, boosting demand for key-value (KV) caching. While recent efforts to evict KV cache have alleviated the inference burden, they often fail to allocate resources rationally…

Computation and Language · Computer Science 2025-12-25 Ziran Qin , Yuchen Cao , Mingbao Lin , Wen Hu , Shixuan Fan , Ke Cheng , Weiyao Lin , Jianguo Li

Explicit and Implicit Knowledge Distillation via Unlabeled Data

Data-free knowledge distillation is a challenging model lightweight task for scenarios in which the original dataset is not available. Previous methods require a lot of extra computational costs to update one or more generators and their…

Computer Vision and Pattern Recognition · Computer Science 2023-02-24 Yuzheng Wang , Zuhao Ge , Zhaoyu Chen , Xian Liu , Chuangjia Ma , Yunquan Sun , Lizhe Qi

MixKD: Towards Efficient Distillation of Large-scale Language Models

Large-scale language models have recently demonstrated impressive empirical performance. Nevertheless, the improved results are attained at the price of bigger models, more power consumption, and slower inference, which hinder their…

Computation and Language · Computer Science 2021-03-18 Kevin J Liang , Weituo Hao , Dinghan Shen , Yufan Zhou , Weizhu Chen , Changyou Chen , Lawrence Carin

Consistent Representation Learning for Continual Relation Extraction

Continual relation extraction (CRE) aims to continuously train a model on data with new relations while avoiding forgetting old ones. Some previous work has proved that storing a few typical samples of old relations and replaying them when…

Computation and Language · Computer Science 2022-05-24 Kang Zhao , Hua Xu , Jiangong Yang , Kai Gao

Knowledge Distillation for Quality Estimation

Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations. Recent…

Computation and Language · Computer Science 2021-07-02 Amit Gajbhiye , Marina Fomicheva , Fernando Alva-Manchego , Frédéric Blain , Abiola Obamuyide , Nikolaos Aletras , Lucia Specia