Related papers: Fuzzy Knowledge Distillation from High-Order TSK t…
Deep neural networks (DNNs) demonstrate great success in classification tasks. However, they act as black boxes and we don't know how they make decisions in a particular classification task. To this end, we propose to distill the knowledge…
Representation learning has emerged as a crucial focus in machine and deep learning, involving the extraction of meaningful and useful features and patterns from the input data, thereby enhancing the performance of various downstream tasks…
Takagi-Sugeno-Kang (TSK) fuzzy systems are flexible and interpretable machine learning models; however, they may not be easily optimized when the data size is large, and/or the data dimensionality is high. This paper proposes a mini-batch…
Multi-label classification can effectively identify the relevant labels of an instance from a given set of labels. However,the modeling of the relationship between the features and the labels is critical to the classification performance.…
The superior interpretability and uncertainty modeling ability of Takagi-Sugeno-Kang fuzzy system (TSK FS) make it possible to describe complex nonlinear systems intuitively and efficiently. However, classical TSK FS usually adopts the…
Large language models (LLMs) offer impressive performance but are impractical for resource-constrained deployment due to high latency and energy consumption. Knowledge distillation (KD) addresses this by transferring knowledge from a large…
Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the…
Knowledge distillation (KD) is a widely adopted technique for transferring knowledge from a high-capacity teacher model to a smaller student model by aligning their output distributions. However, existing methods often underperform in…
Takagi-Sugeno-Kang (TSK) fuzzy systems are very useful machine learning models for regression problems. However, to our knowledge, there has not existed an efficient and effective training algorithm that ensures their generalization…
Clustering is an efficient and essential technique for exploring latent knowledge of data. However, limited attention has been given to the interpretability of the clusters detected by most clustering algorithms. In addition, due to the…
Model transparency, label correlation learning and the robust-ness to label noise are crucial for multilabel learning. However, few existing methods study these three characteristics simultaneously. To address this challenge, we propose the…
Regression analysis is employed to examine and quantify the relationships between input variables and a dependent and continuous output variable. It is widely used for predictive modelling in fields such as finance, healthcare, and…
Knowledge distillation aims to enhance the performance of a lightweight student model by exploiting the knowledge from a pre-trained cumbersome teacher model. However, in the traditional knowledge distillation, teacher predictions are only…
This work introduces a novel knowledge distillation framework for classification tasks where information on existing subclasses is available and taken into consideration. In classification tasks with a small number of classes or binary…
Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge Distillation (KD) has emerged as an effective strategy to improve…
Knowledge Distillation (KD) uses the teacher's prediction logits as soft labels to guide the student, while self-KD does not need a real teacher to require the soft labels. This work unifies the formulations of the two tasks by decomposing…
Knowledge distillation (KD) is an effective model compression technique that transfers knowledge from a high-performance teacher to a lightweight student, reducing computational and storage costs while maintaining competitive accuracy.…
Knowledge distillation is a model compression technique in which a compact "student" network is trained to replicate the predictive behavior of a larger "teacher" network. In logit-based knowledge distillation, it has become the de facto…
Knowledge distillation (KD) has become an important technique for model compression and knowledge transfer. In this work, we first perform a comprehensive analysis of the knowledge transferred by different KD methods. We demonstrate that…
In knowledge distillation, the knowledge from the teacher model is often too complex for the student model to thoroughly process. However, good teachers in real life always simplify complex material before teaching it to students. Inspired…