Related papers: Shellcode_IA32: A Dataset for Automatic Shellcode …

Can We Generate Shellcodes via Natural Language? An Empirical Study

Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly…

Software Engineering · Computer Science 2022-03-09 Pietro Liguori , Erfan Al-Hossami , Domenico Cotroneo , Roberto Natella , Bojan Cukic , Samira Shaikh

The Power of Words: Generating PowerShell Attacks from Natural Language

As the Windows OS stands out as one of the most targeted systems, the PowerShell language has become a key tool for malicious actors and cybersecurity professionals (e.g., for penetration testing). This work explores an uncharted domain in…

Cryptography and Security · Computer Science 2024-04-22 Pietro Liguori , Christian Marescalco , Roberto Natella , Vittorio Orbinato , Luciano Pianese

NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation

Translating natural language into Bash Commands is an emerging research field that has gained attention in recent years. Most efforts have focused on producing more accurate translation models. To the best of our knowledge, only two…

Computation and Language · Computer Science 2023-06-21 Quchen Fu , Zhongwei Teng , Marco Georgaklis , Jules White , Douglas C. Schmidt

DualSC: Automatic Generation and Summarization of Shellcode via Transformer and Dual Learning

A shellcode is a small piece of code and it is executed to exploit a software vulnerability, which allows the target computer to execute arbitrary commands from the attacker through a code injection attack. Similar to the purpose of…

Software Engineering · Computer Science 2022-02-22 Guang Yang , Xiang Chen , Yanlin Zhou , Chi Yu

Neural Machine Translation for Code Generation

Neural machine translation (NMT) methods developed for natural language processing have been shown to be highly successful in automating translation from one natural language to another. Recently, these NMT methods have been adapted to the…

Computation and Language · Computer Science 2023-05-24 Dharma KC , Clayton T. Morrison

On the use of LLMs to generate a dataset of Neural Networks

Neural networks are increasingly used to support decision-making. To verify their reliability and adaptability, researchers and practitioners have proposed a variety of tools and methods for tasks such as NN code verification, refactoring,…

Machine Learning · Computer Science 2026-02-05 Nadia Daoudi , Jordi Cabot

Semantic Code Classification for Automated Machine Learning

A range of applications for automatic machine learning need the generation process to be controllable. In this work, we propose a way to control the output via a sequence of simple actions, that are called semantic code classes. Finally, we…

Machine Learning · Computer Science 2022-01-28 Polina Guseva , Anastasia Drozdova , Natalia Denisenko , Daria Sapozhnikova , Ivan Pyaternev , Anna Scherbakova , Andrey Ustuzhanin

Improving Automated Secure Code Reviews: A Synthetic Dataset for Code Vulnerability Flaws

Automation of code reviews using AI models has garnered substantial attention in the software engineering community as a strategy to reduce the cost and effort associated with traditional peer review processes. These models are typically…

Software Engineering · Computer Science 2025-04-24 Leonardo Centellas-Claros , Juan J. Alonso-Lecaros , Juan Pablo Sandoval Alcocer , Andres Neyem

Enhancing AI-based Generation of Software Exploits with Contextual Information

This practical experience report explores Neural Machine Translation (NMT) models' capability to generate offensive security code from natural language (NL) descriptions, highlighting the significance of contextual understanding and its…

Software Engineering · Computer Science 2024-09-09 Pietro Liguori , Cristina Improta , Roberto Natella , Bojan Cukic , Domenico Cotroneo

NoviCode: Generating Programs from Natural Language Utterances by Novices

Current Text-to-Code models demonstrate impressive capabilities in generating executable code from natural language snippets. However, current studies focus on technical instructions and programmer-oriented language, and it is an open…

Computation and Language · Computer Science 2024-07-17 Asaf Achi Mordechai , Yoav Goldberg , Reut Tsarfaty

Towards Automatic Composition of ASP Programs from Natural Language Specifications

This paper moves the first step towards automating the composition of Answer Set Programming (ASP) specifications. In particular, the following contributions are provided: (i) A dataset focused on graph-related problem specifications,…

Artificial Intelligence · Computer Science 2024-03-08 Manuel Borroto , Irfan Kareem , Francesco Ricca

Shell Language Processing: Unix command parsing for Machine Learning

In this article, we present a Shell Language Preprocessing (SLP) library, which implements tokenization and encoding directed at parsing Unix and Linux shell commands. We describe the rationale behind the need for a new approach with…

Machine Learning · Computer Science 2022-07-08 Dmitrijs Trizna

EVIL: Exploiting Software via Natural Language

Writing exploits for security assessment is a challenging task. The writer needs to master programming and obfuscation techniques to develop a successful exploit. To make the task easier, we propose an approach (EVIL) to automatically…

Software Engineering · Computer Science 2022-03-09 Pietro Liguori , Erfan Al-Hossami , Vittorio Orbinato , Roberto Natella , Samira Shaikh , Domenico Cotroneo , Bojan Cukic

Guiding AI to Fix Its Own Flaws: An Empirical Study on LLM-Driven Secure Code Generation

Large Language Models (LLMs) have become powerful tools for automated code generation. However, these models often overlook critical security practices, which can result in the generation of insecure code that contains…

Software Engineering · Computer Science 2025-07-01 Hao Yan , Swapneel Suhas Vaidya , Xiaokuan Zhang , Ziyu Yao

RedShell: A Generative AI-Based Approach to Ethical Hacking

The application of Machine Learning techniques in code generation is now a common practice for most developers. Tools such as ChatGPT from OpenAI leverage the natural language processing capabilities of Large Language Models to generate…

Cryptography and Security · Computer Science 2026-04-14 Ricardo Bessa , Rui Claro , João Trindade , João Lourenço

Constructing Multilingual Code Search Dataset Using Neural Machine Translation

Code search is a task to find programming codes that semantically match the given natural language queries. Even though some of the existing datasets for this task are multilingual on the programming language side, their query data are only…

Computation and Language · Computer Science 2023-06-28 Ryo Sekizawa , Nan Duan , Shuai Lu , Hitomi Yanaka

code2seq: Generating Sequences from Structured Representations of Code

The ability to generate natural language sequences from source code snippets has a variety of applications such as code summarization, documentation, and retrieval. Sequence-to-sequence (seq2seq) models, adopted from neural machine…

Machine Learning · Computer Science 2019-02-22 Uri Alon , Shaked Brody , Omer Levy , Eran Yahav

An Automatically Created Novel Bug Dataset and its Validation in Bug Prediction

Bugs are inescapable during software development due to frequent code changes, tight deadlines, etc.; therefore, it is important to have tools to find these errors. One way of performing bug identification is to analyze the characteristics…

Software Engineering · Computer Science 2020-06-19 Rudolf Ferenc , Péter Gyimesi , Gábor Gyimesi , Zoltán Tóth , Tibor Gyimóthy

AixBench: A Code Generation Benchmark Dataset

We present a benchmark dataset for evaluating method-level code generation task. The benchmark contains a dataset of 175 samples for automated evaluation and a dataset of 161 samples for manual evaluation. We also present a new metric for…

Software Engineering · Computer Science 2022-07-22 Yiyang Hao , Ge Li , Yongqiang Liu , Xiaowei Miao , He Zong , Siyuan Jiang , Yang Liu , He Wei

A Comprehensive Review of State-of-The-Art Methods for Java Code Generation from Natural Language Text

Java Code Generation consists in generating automatically Java code from a Natural Language Text. This NLP task helps in increasing programmers' productivity by providing them with immediate solutions to the simplest and most repetitive…

Computation and Language · Computer Science 2023-06-13 Jessica López Espejel , Mahaman Sanoussi Yahaya Alassan , El Mehdi Chouham , Walid Dahhane , El Hassane Ettifouri