Related papers: Mamba base PKD for efficient knowledge compression

Knowledge Distillation Framework for Accelerating High-Accuracy Neural Network-Based Molecular Dynamics Simulations

Neural network potentials (NNPs) offer a powerful alternative to traditional force fields for molecular dynamics (MD) simulations. Accurate and stable MD simulations, crucial for evaluating material properties, require training data…

Machine Learning · Computer Science 2025-06-23 Naoki Matsumura , Yuta Yoshimoto , Yuto Iwasaki , Meguru Yamazaki , Yasufumi Sakai

PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory Access Prediction Models

Deep neural networks (DNNs) have proven to be effective models for accurate Memory Access Prediction (MAP), a critical task in mitigating memory latency through data prefetching. However, existing DNN-based MAP models suffer from the…

Machine Learning · Computer Science 2024-02-22 Neelesh Gupta , Pengmiao Zhang , Rajgopal Kannan , Viktor Prasanna

Unleashing the Potential of Mamba: Boosting a LiDAR 3D Sparse Detector by Using Cross-Model Knowledge Distillation

The LiDAR 3D object detector that strikes a balance between accuracy and speed is crucial for achieving real-time perception in autonomous driving. However, many existing LiDAR detection models depend on complex feature transformations,…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Rui Yu , Runkai Zhao , Jiagen Li , Qingsong Zhao , HuaiCheng Yan , Meng Wang

Teacher-Student Architecture for Knowledge Distillation: A Survey

Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs are hard to be deployed in real-world systems due to their voluminous parameters. To tackle this issue, Teacher-Student…

Machine Learning · Computer Science 2023-08-09 Chengming Hu , Xuan Li , Dan Liu , Haolun Wu , Xi Chen , Ju Wang , Xue Liu

An Empirical Study of Knowledge Distillation for Code Understanding Tasks

Pre-trained language models (PLMs) have emerged as powerful tools for code understanding. However, deploying these PLMs in large-scale applications faces practical challenges due to their computational intensity and inference latency.…

Software Engineering · Computer Science 2025-08-22 Ruiqi Wang , Zezhou Yang , Cuiyun Gao , Xin Xia , Qing Liao

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Linear RNN architectures, like Mamba, can be competitive with Transformer models in language modeling while having advantageous deployment characteristics. Given the focus on training large-scale Transformer models, we consider the…

Machine Learning · Computer Science 2025-06-30 Junxiong Wang , Daniele Paliotta , Avner May , Alexander M. Rush , Tri Dao

A Comprehensive Survey on Knowledge Distillation

Deep Neural Networks (DNNs) have achieved notable performance in the fields of computer vision and natural language processing with various applications in both academia and industry. However, with recent advancements in DNNs and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Amir M. Mansourian , Rozhan Ahmadi , Masoud Ghafouri , Amir Mohammad Babaei , Elaheh Badali Golezani , Zeynab Yasamani Ghamchi , Vida Ramezanian , Alireza Taherian , Kimia Dinashi , Amirali Miri , Shohreh Kasaei

Model compression via distillation and quantization

Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning. One aspect of the field receiving considerable attention is efficiently executing deep…

Neural and Evolutionary Computing · Computer Science 2018-02-16 Antonio Polino , Razvan Pascanu , Dan Alistarh

Inplace knowledge distillation with teacher assistant for improved training of flexible deep neural networks

Deep neural networks (DNNs) have achieved great success in various machine learning tasks. However, most existing powerful DNN models are computationally expensive and memory demanding, hindering their deployment in devices with low memory…

Signal Processing · Electrical Eng. & Systems 2021-05-19 Alexey Ozerov , Ngoc Duong

Teacher-Student Architecture for Knowledge Learning: A Survey

Although Deep Neural Networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs with voluminous parameters are hard to be deployed in a real-time system. To tackle this issue, Teacher-Student…

Machine Learning · Computer Science 2022-11-01 Chengming Hu , Xuan Li , Dan Liu , Xi Chen , Ju Wang , Xue Liu

Mamba4Net: Distilled Hybrid Mamba Large Language Models For Networking

Transformer-based large language models (LLMs) are increasingly being adopted in networking research to address domain-specific challenges. However, their quadratic time complexity and substantial model sizes often result in significant…

Networking and Internet Architecture · Computer Science 2025-10-21 Linhan Xia , Mingzhan Yang , Jingjing Wang , Ziwei Yan , Yakun Ren , Guo Yu , Kai Lei

Vision Mamba Distillation for Low-resolution Fine-grained Image Classification

Low-resolution fine-grained image classification has recently made significant progress, largely thanks to the super-resolution techniques and knowledge distillation methods. However, these approaches lead to an exponential increase in the…

Computer Vision and Pattern Recognition · Computer Science 2024-11-28 Yao Chen , Jiabao Wang , Peichao Wang , Rui Zhang , Yang Li

SFT-KD-Recon: Learning a Student-friendly Teacher for Knowledge Distillation in Magnetic Resonance Image Reconstruction

Deep cascaded architectures for magnetic resonance imaging (MRI) acceleration have shown remarkable success in providing high-quality reconstruction. However, as the number of cascades increases, the improvements in reconstruction tend to…

Image and Video Processing · Electrical Eng. & Systems 2024-02-06 Matcha Naga Gayathri , Sriprabha Ramanarayanan , Mohammad Al Fahim , Rahul G S , Keerthi Ram , Mohanasankar Sivaprakasam

An Empirical Study of Leveraging Knowledge Distillation for Compressing Multilingual Neural Machine Translation Models

Knowledge distillation (KD) is a well-known method for compressing neural models. However, works focusing on distilling knowledge from large multilingual neural machine translation (MNMT) models into smaller ones are practically…

Computation and Language · Computer Science 2023-04-20 Varun Gumma , Raj Dabre , Pratyush Kumar

HPM-KD: Hierarchical Progressive Multi-Teacher Framework for Knowledge Distillation and Efficient Model Compression

Knowledge Distillation (KD) has emerged as a promising technique for model compression but faces critical limitations: (1) sensitivity to hyperparameters requiring extensive manual tuning, (2) capacity gap when distilling from very large…

Machine Learning · Computer Science 2025-12-11 Gustavo Coelho Haase , Paulo Henrique Dourado da Silva

Knowledge Distillation with Feature Maps for Image Classification

The model reduction problem that eases the computation costs and latency of complex deep learning architectures has received an increasing number of investigations owing to its importance in model deployment. One promising method is…

Machine Learning · Computer Science 2018-12-04 Wei-Chun Chen , Chia-Che Chang , Chien-Yu Lu , Che-Rung Lee

MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models

Pretrained language models have led to significant performance gains in many NLP tasks. However, the intensive computing resources to train such models remain an issue. Knowledge distillation alleviates this problem by learning a…

Computation and Language · Computer Science 2020-05-04 Linqing Liu , Huan Wang , Jimmy Lin , Richard Socher , Caiming Xiong

DLRMamba: Distilling Low-Rank Mamba for Edge Multispectral Fusion Object Detection

Multispectral fusion object detection is a critical task for edge-based maritime surveillance and remote sensing, demanding both high inference efficiency and robust feature representation for high-resolution inputs. However, current State…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Qianqian Zhang , Leon Tabaro , Ahmed M. Abdelmoniem , Junshe An

Attention to Mamba: A Recipe for Cross-Architecture Distillation

State Space Models (SSMs) such as Mamba have become a popular alternative to Transformer models, due to their reduced memory consumption and higher throughput at generation compared to their Attention-based counterparts. On the other hand,…

Computation and Language · Computer Science 2026-04-17 Abhinav Moudgil , Ningyuan Huang , Eeshan Gunesh Dhekane , Pau Rodríguez , Luca Zappella , Federico Danieli

Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains

Pre-trained language models have been applied to various NLP tasks with considerable performance gains. However, the large model sizes, together with the long inference time, limit the deployment of such models in real-time applications.…

Computation and Language · Computer Science 2022-11-03 Haojie Pan , Chengyu Wang , Minghui Qiu , Yichang Zhang , Yaliang Li , Jun Huang