Related papers: Machine Learning Transferability for Malware Detec…
This paper describes EMBER: a labeled benchmark dataset for training machine learning models to statically detect malicious Windows portable executable files. The dataset includes features extracted from 1.1M binary files: 900K training…
The constant growth in the number of malware - software or code fragment potentially harmful for computers and information networks - and the use of sophisticated evasion and obfuscation techniques have seriously hindered classic…
The threat of malware is a serious concern for computer networks and systems, highlighting the need for accurate classification techniques. In this research, we experiment with multimodal machine learning approaches for malware…
In this paper, we explore the use of metric learning to embed Windows PE files in a low-dimensional vector space for downstream use in a variety of applications, including malware detection, family classification, and malware attribute…
The increasing number of sophisticated malware poses a major cybersecurity threat. Portable executable (PE) files are a common vector for such malware. In this work we review and evaluate machine learning-based PE malware detection…
This study investigates the effectiveness of several machine learning algorithms for static malware detection using the EMBER dataset, which contains feature representations of Portable Executable (PE) files. We evaluate eight…
In addition to signature-based and heuristics-based detection techniques, machine learning (ML) is widely used to generalize to new, never-before-seen malicious software (malware). However, it has been demonstrated that ML models can be…
In recent years there has been a shift from heuristics-based malware detection towards machine learning, which proves to be more robust in the current heavily adversarial threat landscape. While we acknowledge machine learning to be better…
Recent work has shown that adversarial Windows malware samples - referred to as adversarial EXEmples in this paper - can bypass machine learning-based detection relying on static code analysis by perturbing relatively few input bytes. To…
Despite the promising results of machine learning models in malicious files detection, they face the problem of concept drift due to their constant evolution. This leads to declining performance over time, as the data distribution of the…
Artificial intelligence techniques have achieved strong performance in classifying Windows Portable Executable (PE) malware, but their reliability often degrades under dataset shifts, leading to misclassifications with severe security…
A lack of accessible data has historically restricted malware analysis research, and practitioners have relied heavily on datasets provided by industry sources to advance. Existing public datasets are limited by narrow scope - most include…
In an era of escalating cyber threats, malware poses significant risks to individuals and organizations, potentially leading to data breaches, system failures, and substantial financial losses. This study addresses the urgent need for…
Driven by the high profit, Portable Executable (PE) malware has been consistently evolving in terms of both volume and sophistication. PE malware family classification has gained great attention and a large number of approaches have been…
Malware detection is a constant challenge in cybersecurity due to the rapid development of new attack techniques. Traditional signature-based approaches struggle to keep pace with the sheer volume of malware samples. Machine learning offers…
Automated malware analysis increasingly relies on machine learning, yet most existing methods remain task-specific and depend on handcrafted features or narrowly scoped models. Recent developments in binary-level foundation models suggest a…
Ontologies are a standard tool for creating semantic schemata in many knowledge intensive domains of human interest. They are becoming increasingly important also in the areas that have been until very recently dominated by subsymbolic…
Machine-learning methods have already been exploited as useful tools for detecting malicious executable files. They leverage data retrieved from malware samples, such as header fields, instruction sequences, or even raw bytes, to learn…
This paper describes a multi-feature dataset for training machine learning classifiers for detecting malicious Windows Portable Executable (PE) files. The dataset includes four feature sets from 18,551 binary samples belonging to five…
In today's interconnected digital landscape, the proliferation of malware poses a significant threat to the security and stability of computer networks and systems worldwide. As the complexity of malicious tactics, techniques, and…