Related papers: MalwarePT: A Binary-Level Foundation Model for Mal…

Binary BPE: A Family of Cross-Platform Tokenizers for Binary Analysis

Sequence models for binary analysis are bottlenecked by byte-level tokenization: raw bytes waste precious context window capacity for transformers and other neural network architectures, and many existing text-oriented tokenizers fail on…

Machine Learning · Computer Science 2025-11-25 Michael J. Bommarito

An Ensemble of Pre-trained Transformer Models For Imbalanced Multiclass Malware Classification

Classification of malware families is crucial for a comprehensive understanding of how they can infect devices, computers, or systems. Thus, malware identification enables security researchers and incident responders to take precautions…

Cryptography and Security · Computer Science 2022-06-23 Ferhat Demirkıran , Aykut Çayır , Uğur Ünal , Hasan Dağ

Activation Analysis of a Byte-Based Deep Neural Network for Malware Classification

Feature engineering is one of the most costly aspects of developing effective machine learning models, and that cost is even greater in specialized problem domains, like malware classification, where expert skills are necessary to identify…

Machine Learning · Computer Science 2019-08-02 Scott E. Coull , Christopher Gardner

MalBERT: Using Transformers for Cybersecurity and Malicious Software Detection

In recent years we have witnessed an increase in cyber threats and malicious software attacks on different platforms with important consequences to persons and businesses. It has become critical to find automated machine learning techniques…

Cryptography and Security · Computer Science 2021-03-08 Abir Rahali , Moulay A. Akhloufi

Towards an Automated Pipeline for Detecting and Classifying Malware through Machine Learning

The constant growth in the number of malware - software or code fragment potentially harmful for computers and information networks - and the use of sophisticated evasion and obfuscation techniques have seriously hindered classic…

Cryptography and Security · Computer Science 2021-06-11 Nicola Loi , Claudio Borile , Daniele Ucci

Semantic Preprocessing for LLM-based Malware Analysis

In a context of malware analysis, numerous approaches rely on Artificial Intelligence to handle a large volume of data. However, these techniques focus on data view (images, sequences) and not on an expert's view. Noticing this issue, we…

Cryptography and Security · Computer Science 2025-10-06 Benjamin Marais , Tony Quertier , Grégoire Barrue

Efficient Malware Analysis Using Metric Embeddings

In this paper, we explore the use of metric learning to embed Windows PE files in a low-dimensional vector space for downstream use in a variety of applications, including malware detection, family classification, and malware attribute…

Machine Learning · Computer Science 2022-12-07 Ethan M. Rudd , David Krisiloff , Scott Coull , Daniel Olszewski , Edward Raff , James Holt

A multi-task learning model for malware classification with useful file access pattern from API call sequence

Based on API call sequences, semantic-aware and machine learning (ML) based malware classifiers can be built for malware detection or classification. Previous works concentrate on crafting and extracting various features from malware…

Sound · Computer Science 2016-10-20 Xin Wang , Siu Ming Yiu

Foundational Models for Malware Embeddings Using Spatio-Temporal Parallel Convolutional Networks

In today's interconnected digital landscape, the proliferation of malware poses a significant threat to the security and stability of computer networks and systems worldwide. As the complexity of malicious tactics, techniques, and…

Cryptography and Security · Computer Science 2023-05-26 Dhruv Nandakumar , Devin Quinn , Elijah Soba , Eunyoung Kim , Christopher Redino , Chris Chan , Kevin Choi , Abdul Rahman , Edward Bowen

Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models

Standard Byte-Pair Encoding (BPE) tokenization compresses text by pairing a learned token vocabulary with a detailed merge list. Recent work has shown that this merge list exposes a potential attack surface for extracting information about…

Computation and Language · Computer Science 2025-08-12 Tomohiro Sawada , Kartik Goyal

Scalable APT Malware Classification via Parallel Feature Extraction and GPU-Accelerated Learning

This paper presents an underlying framework for both automating and accelerating malware classification, more specifically, mapping malicious executables to known Advanced Persistent Threat (APT) groups. The main feature of this analysis is…

Cryptography and Security · Computer Science 2025-04-23 Noah Subedar , Taeui Kim , Saathwick Venkataramalingam

Multimodal Techniques for Malware Classification

The threat of malware is a serious concern for computer networks and systems, highlighting the need for accurate classification techniques. In this research, we experiment with multimodal machine learning approaches for malware…

Cryptography and Security · Computer Science 2025-01-22 Jonathan Jiang , Mark Stamp

On the Effectiveness of Binary Emulation in Malware Classification

Malware authors are continuously evolving their code base to include counter-analysis methods that can significantly hinder their detection and blocking. While the execution of malware in a sandboxed environment may provide a lot of…

Cryptography and Security · Computer Science 2022-04-11 Vasilis Vouvoutsis , Fran Casino , Constantinos Patsakis

A Comprehensive Study on Learning-Based PE Malware Family Classification Methods

Driven by the high profit, Portable Executable (PE) malware has been consistently evolving in terms of both volume and sophistication. PE malware family classification has gained great attention and a large number of approaches have been…

Cryptography and Security · Computer Science 2021-11-01 Yixuan Ma , Shuang Liu , Jiajun Jiang , Guanhong Chen , Keqiu Li

Instance Attack:An Explanation-based Vulnerability Analysis Framework Against DNNs for Malware Detection

Deep neural networks (DNNs) are increasingly being applied in malware detection and their robustness has been widely debated. Traditionally an adversarial example generation scheme relies on either detailed model information (gradient-based…

Cryptography and Security · Computer Science 2022-09-07 Sun RuiJin , Guo ShiZe , Guo JinHong , Xing ChangYou , Yang LuMing , Guo Xi , Pan ZhiSong

Can bidirectional encoder become the ultimate winner for downstream applications of foundation models?

Over the past few decades, Artificial Intelligence(AI) has progressed from the initial machine learning stage to the deep learning stage, and now to the stage of foundational models. Foundational models have the characteristics of…

Computation and Language · Computer Science 2024-11-28 Lewen Yang , Xuanyu Zhou , Juao Fan , Xinyi Xie , Shengxin Zhu

Machine Learning Aided Static Malware Analysis: A Survey and Tutorial

Malware analysis and detection techniques have been evolving during the last decade as a reflection to development of different malware techniques to evade network-based and host-based security protections. The fast growth in variety and…

Cryptography and Security · Computer Science 2018-08-06 Andrii Shalaginov , Sergii Banin , Ali Dehghantanha , Katrin Franke

Attacks on Visualization-Based Malware Detection: Balancing Effectiveness and Executability

With the rapid development of machine learning for image classification, researchers have found new applications of visualization techniques in malware detection. By converting binary code into images, researchers have shown satisfactory…

Cryptography and Security · Computer Science 2021-09-23 Hadjer Benkraouda , Jingyu Qian , Hung Quoc Tran , Berkay Kaplan

Malware Analysis with Artificial Intelligence and a Particular Attention on Results Interpretability

Malware detection and analysis are active research subjects in cybersecurity over the last years. Indeed, the development of obfuscation techniques, as packing, for example, requires special attention to detect recent variants of malware.…

Cryptography and Security · Computer Science 2021-07-26 Benjamin Marais , Tony Quertier , Christophe Chesneau

GraphBPE: Molecular Graphs Meet Byte-Pair Encoding

With the increasing attention to molecular machine learning, various innovations have been made in designing better models or proposing more comprehensive benchmarks. However, less is studied on the data preprocessing schedule for molecular…

Machine Learning · Computer Science 2024-07-30 Yuchen Shen , Barnabás Póczos