Related papers: Efficient Audio Captioning with Encoder-Level Know…

Knowledge Distillation from Multiple Foundation Models for End-to-End Speech Recognition

Although large foundation models pre-trained by self-supervised learning have achieved state-of-the-art performance in many tasks including automatic speech recognition (ASR), knowledge distillation (KD) is often required in practice to…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-21 Xiaoyu Yang , Qiujia Li , Chao Zhang , Philip C. Woodland

Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition

Recently, the advance in deep learning has brought a considerable improvement in the end-to-end speech recognition field, simplifying the traditional pipeline while producing promising results. Among the end-to-end models, the connectionist…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-29 Ji Won Yoon , Beom Jun Woo , Sunghwan Ahn , Hyeonseung Lee , Nam Soo Kim

Creating a Good Teacher for Knowledge Distillation in Acoustic Scene Classification

Knowledge Distillation (KD) is a widespread technique for compressing the knowledge of large models into more compact and efficient models. KD has proved to be highly effective in building well-performing low-complexity Acoustic Scene…

Sound · Computer Science 2025-03-17 Tobias Morocutti , Florian Schmid , Khaled Koutini , Gerhard Widmer

Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer

Streaming automatic speech recognition (ASR) models are restricted from accessing future context, which results in worse performance compared to the non-streaming models. To improve the performance of streaming ASR, knowledge distillation…

Computation and Language · Computer Science 2023-09-01 Kyuhong Shim , Jinkyu Lee , Simyung Chang , Kyuwoong Hwang

Lightweight Neural Network with Knowledge Distillation for CSI Feedback

Deep learning has shown promise in enhancing channel state information (CSI) feedback. However, many studies indicate that better feedback performance often accompanies higher computational complexity. Pursuing better performance-complexity…

Signal Processing · Electrical Eng. & Systems 2024-03-05 Yiming Cui , Jiajia Guo , Zheng Cao , Huaze Tang , Chao-Kai Wen , Shi Jin , Xin Wang , Xiaolin Hou

Adaptive Knowledge Distillation for Device-Directed Speech Detection

Device-directed speech detection (DDSD) is a binary classification task that separates the user's queries to a voice assistant (VA) from background speech or side conversations. This is important for achieving naturalistic user experience.…

Sound · Computer Science 2025-08-06 Hyung Gun Chi , Florian Pesce , Wonil Chang , Oggi Rudovic , Arturo Argueta , Stefan Braun , Vineet Garg , Ahmed Hussen Abdelaziz

CED: Consistent ensemble distillation for audio tagging

Augmentation and knowledge distillation (KD) are well-established techniques employed in audio classification tasks, aimed at enhancing performance and reducing model sizes on the widely recognized Audioset (AS) benchmark. Although both…

Sound · Computer Science 2023-09-11 Heinrich Dinkel , Yongqing Wang , Zhiyong Yan , Junbo Zhang , Yujun Wang

Knowledge Distillation for Efficient Audio-Visual Video Captioning

Automatically describing audio-visual content with texts, namely video captioning, has received significant attention due to its potential applications across diverse fields. Deep neural networks are the dominant methods, offering…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-19 Özkan Çaylı , Xubo Liu , Volkan Kılıç , Wenwu Wang

Knowledge Distillation for Speech Denoising by Latent Representation Alignment with Cosine Distance

Speech denoising is a generally adopted and impactful task, appearing in many common and everyday-life use cases. Although there are very powerful methods published, most of those are too complex for deployment in everyday and low-resources…

Sound · Computer Science 2025-05-07 Diep Luong , Mikko Heikkinen , Konstantinos Drossos , Tuomas Virtanen

Co-training and Co-distillation for Quality Improvement and Compression of Language Models

Knowledge Distillation (KD) compresses computationally expensive pre-trained language models (PLMs) by transferring their knowledge to smaller models, allowing their use in resource-constrained or real-time settings. However, most smaller…

Computation and Language · Computer Science 2023-11-08 Hayeon Lee , Rui Hou , Jongpil Kim , Davis Liang , Hongbo Zhang , Sung Ju Hwang , Alexander Min

Guiding Frame-Level CTC Alignments Using Self-knowledge Distillation

Transformer encoder with connectionist temporal classification (CTC) framework is widely used for automatic speech recognition (ASR). However, knowledge distillation (KD) for ASR displays a problem of disagreement between teacher-student…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-13 Eungbeom Kim , Hantae Kim , Kyogu Lee

Revisiting Knowledge Distillation for Autoregressive Language Models

Knowledge distillation (KD) is a common approach to compress a teacher model to reduce its inference cost and memory footprint, by training a smaller student model. However, in the context of autoregressive language models (LMs), we…

Computation and Language · Computer Science 2024-06-18 Qihuang Zhong , Liang Ding , Li Shen , Juhua Liu , Bo Du , Dacheng Tao

Feature Alignment and Representation Transfer in Knowledge Distillation for Large Language Models

Knowledge distillation (KD) is a technique for transferring knowledge from complex teacher models to simpler student models, significantly enhancing model efficiency and accuracy. It has demonstrated substantial advancements in various…

Computation and Language · Computer Science 2025-04-21 Junjie Yang , Junhao Song , Xudong Han , Ziqian Bi , Tianyang Wang , Chia Xin Liang , Xinyuan Song , Yichao Zhang , Qian Niu , Benji Peng , Keyu Chen , Ming Liu

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Knowledge distillation (KD) is the de facto standard for compressing large-scale models into smaller ones. Prior works have explored ever more complex KD strategies involving different objective functions, teacher-ensembles, and weight…

Computer Vision and Pattern Recognition · Computer Science 2025-05-06 Vishaal Udandarao , Nikhil Parthasarathy , Muhammad Ferjad Naeem , Talfan Evans , Samuel Albanie , Federico Tombari , Yongqin Xian , Alessio Tonioni , Olivier J. Hénaff

Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition

The smaller memory bandwidth in smart devices prompts development of smaller Automatic Speech Recognition (ASR) models. To obtain a smaller model, one can employ the model compression techniques. Knowledge distillation (KD) is a popular…

Sound · Computer Science 2022-10-04 Jash Rathod , Nauman Dawalatabad , Shatrughan Singh , Dhananjaya Gowda

Context-Aware Knowledge Distillation with Adaptive Weighting for Image Classification

Knowledge distillation (KD) is a widely used technique to transfer knowledge from a large teacher network to a smaller student model. Traditional KD uses a fixed balancing factor alpha as a hyperparameter to combine the hard-label…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Zhengda Li

Knowledge Distillation Beyond Model Compression

Knowledge distillation (KD) is commonly deemed as an effective model compression technique in which a compact model (student) is trained under the supervision of a larger pretrained model or an ensemble of models (teacher). Various…

Machine Learning · Computer Science 2020-07-08 Fahad Sarfraz , Elahe Arani , Bahram Zonooz

ACAM-KD: Adaptive and Cooperative Attention Masking for Knowledge Distillation

Dense visual prediction tasks, such as detection and segmentation, are crucial for time-critical applications (e.g., autonomous driving and video surveillance). While deep models achieve strong performance, their efficiency remains a…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Qizhen Lan , Qing Tian

Improving Knowledge Distillation with Teacher's Explanation

Knowledge distillation (KD) improves the performance of a low-complexity student model with the help of a more powerful teacher. The teacher in KD is a black-box model, imparting knowledge to the student only through its predictions. This…

Machine Learning · Computer Science 2023-10-05 Sayantan Chowdhury , Ben Liang , Ali Tizghadam , Ilijc Albanese

Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding

Automated audio captioning (AAC) is an audio-to-text task to describe audio contents in natural language. Recently, the advancements in large language models (LLMs), with improvements in training approaches for audio encoders, have opened…

Sound · Computer Science 2024-06-26 Jizhong Liu , Gang Li , Junbo Zhang , Heinrich Dinkel , Yongqing Wang , Zhiyong Yan , Yujun Wang , Bin Wang