Malware Classification from Memory Dumps Using Machine Learning, Transformers, and Large Language Models

Areej Dweib; Montaser Tanina; Shehab Alawi; Mohammad Dyab; Huthaifa I. Ashqar

Malware Classification from Memory Dumps Using Machine Learning, Transformers, and Large Language Models

Machine Learning 2025-03-05 v1 Computation and Language Cryptography and Security

Authors: Areej Dweib , Montaser Tanina , Shehab Alawi , Mohammad Dyab , Huthaifa I. Ashqar

Abstract

This study investigates the performance of various classification models for a malware classification task using different feature sets and data configurations. Six models-Logistic Regression, K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Decision Trees, Random Forest (RF), and Extreme Gradient Boosting (XGB)-were evaluated alongside two deep learning models, Recurrent Neural Networks (RNN) and Transformers, as well as the Gemini zero-shot and few-shot learning methods. Four feature sets were tested including All Features, Literature Review Features, the Top 45 Features from RF, and Down-Sampled with Top 45 Features. XGB achieved the highest accuracy of 87.42% using the Top 45 Features, outperforming all other models. RF followed closely with 87.23% accuracy on the same feature set. In contrast, deep learning models underperformed, with RNN achieving 66.71% accuracy and Transformers reaching 71.59%. Down-sampling reduced performance across all models, with XGB dropping to 81.31%. Gemini zero-shot and few-shot learning approaches showed the lowest performance, with accuracies of 40.65% and 48.65%, respectively. The results highlight the importance of feature selection in improving model performance while reducing computational complexity. Traditional models like XGB and RF demonstrated superior performance, while deep learning and few-shot methods struggled to match their accuracy. This study underscores the effectiveness of traditional machine learning models for structured datasets and provides a foundation for future research into hybrid approaches and larger datasets.

Keywords

deep learning for image classification malware detection deep learning

Cite

@article{arxiv.2503.02144,
  title  = {Malware Classification from Memory Dumps Using Machine Learning, Transformers, and Large Language Models},
  author = {Areej Dweib and Montaser Tanina and Shehab Alawi and Mohammad Dyab and Huthaifa I. Ashqar},
  journal= {arXiv preprint arXiv:2503.02144},
  year   = {2025}
}

Malware Classification from Memory Dumps Using Machine Learning, Transformers, and Large Language Models

Abstract

Keywords

Cite

Related papers