We take the first step to address the task of automatically generating shellcodes, i.e., small pieces of code used as a payload in the exploitation of a software vulnerability, starting from natural language comments. We assemble and release a novel dataset (Shellcode_IA32), consisting of challenging but common assembly instructions with their natural language descriptions. We experiment with standard methods in neural machine translation (NMT) to establish baseline performance levels on this task.
@article{arxiv.2104.13100,
title = {Shellcode_IA32: A Dataset for Automatic Shellcode Generation},
author = {Pietro Liguori and Erfan Al-Hossami and Domenico Cotroneo and Roberto Natella and Bojan Cukic and Samira Shaikh},
journal= {arXiv preprint arXiv:2104.13100},
year = {2022}
}
Comments
Paper accepted to NLP4Prog Workshop 2021 co-located with ACL-IJCNLP 2021. Extended journal version of this work has been published in the Automated Software Engineering journal, Volume 29, Article no. 30, March 2022, DOI: 10.1007/s10515-022-00331-3