Related papers: Shell Language Processing: Unix command parsing fo…

Natural Language or Not (NLoN) - A Package for Software Engineering Text Analysis Pipeline

The use of natural language processing (NLP) is gaining popularity in software engineering. In order to correctly perform NLP, we must pre-process the textual information to separate natural language from other information, such as log…

Software Engineering · Computer Science 2018-03-21 Mika V. Mäntylä , Fabio Calefato , Maelick Claes

SPELL: Semantic Prompt Evolution based on a LLM

Prompt engineering is a new paradigm for enhancing the performance of trained neural network models. For optimizing text-style prompts, existing methods usually individually operate small portions of a text step by step, which either breaks…

Computation and Language · Computer Science 2023-10-03 Yujian Betterest Li , Kai Wu

Can We Generate Shellcodes via Natural Language? An Empirical Study

Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly…

Software Engineering · Computer Science 2022-03-09 Pietro Liguori , Erfan Al-Hossami , Domenico Cotroneo , Roberto Natella , Bojan Cukic , Samira Shaikh

Changing the Representation: Examining Language Representation for Neural Sign Language Production

Neural Sign Language Production (SLP) aims to automatically translate from spoken language sentences to sign language videos. Historically the SLP task has been broken into two steps; Firstly, translating from a spoken language sentence to…

Computation and Language · Computer Science 2022-10-13 Harry Walsh , Ben Saunders , Richard Bowden

LLM-Supported Natural Language to Bash Translation

The Bourne-Again Shell (Bash) command-line interface for Linux systems has complex syntax and requires extensive specialized knowledge. Using the natural language to Bash command (NL2SH) translation capabilities of large language models…

Computation and Language · Computer Science 2025-02-12 Finnian Westenfelder , Erik Hemberg , Miguel Tulla , Stephen Moskal , Una-May O'Reilly , Silviu Chiricescu

Enhancing Large Language Models with Faster Code Preprocessing for Vulnerability Detection

The application of Artificial Intelligence has become a powerful approach to detecting software vulnerabilities. However, effective vulnerability detection relies on accurately capturing the semantic structure of code and its contextual…

Software Engineering · Computer Science 2025-05-12 José Gonçalves , Miguel Silva , Eva Maia , Isabel Praça

Natural Language Processing for Policymaking

Language is the medium for many political activities, from campaigns to news reports. Natural language processing (NLP) uses computational tools to parse text into key information that is needed for policymaking. In this chapter, we…

Computation and Language · Computer Science 2023-02-08 Zhijing Jin , Rada Mihalcea

Semantic Preprocessing for LLM-based Malware Analysis

In a context of malware analysis, numerous approaches rely on Artificial Intelligence to handle a large volume of data. However, these techniques focus on data view (images, sequences) and not on an expert's view. Noticing this issue, we…

Cryptography and Security · Computer Science 2025-10-06 Benjamin Marais , Tony Quertier , Grégoire Barrue

Deciphering genomic codes using advanced NLP techniques: a scoping review

Objectives: The vast and complex nature of human genomic sequencing data presents challenges for effective analysis. This review aims to investigate the application of Natural Language Processing (NLP) techniques, particularly Large…

Genomics · Quantitative Biology 2025-02-28 Shuyan Cheng , Yishu Wei , Yiliang Zhou , Zihan Xu , Drew N Wright , Jinze Liu , Yifan Peng

SNNLP: Energy-Efficient Natural Language Processing Using Spiking Neural Networks

As spiking neural networks receive more attention, we look toward applications of this computing paradigm in fields other than computer vision and signal processing. One major field, underexplored in the neuromorphic setting, is Natural…

Computation and Language · Computer Science 2024-02-01 R. Alexander Knipper , Kaniz Mishty , Mehdi Sadi , Shubhra Kanti Karmaker Santu

On the Role of Text Preprocessing in Neural Network Architectures: An Evaluation Study on Text Categorization and Sentiment Analysis

Text preprocessing is often the first step in the pipeline of a Natural Language Processing (NLP) system, with potential impact in its final performance. Despite its importance, text preprocessing has not received much attention in the deep…

Computation and Language · Computer Science 2018-08-24 Jose Camacho-Collados , Mohammad Taher Pilehvar

Comparison Study Between Token Classification and Sequence Classification In Text Classification

Unsupervised Machine Learning techniques have been applied to Natural Language Processing tasks and surpasses the benchmarks such as GLUE with great success. Building language models approach achieves good results in one language and it can…

Computation and Language · Computer Science 2022-11-28 Amir Jafari

TokenBreak: Bypassing Text Classification Models Through Token Manipulation

Natural Language Processing (NLP) models are used for text-related tasks such as classification and generation. To complete these tasks, input data is first tokenized from human-readable text into a format the model can understand, enabling…

Machine Learning · Computer Science 2025-06-10 Kasimir Schulz , Kenneth Yeung , Kieran Evans

Learning Mechanism Underlying NLP Pre-Training and Fine-Tuning

Natural language processing (NLP) enables the understanding and generation of meaningful human language, typically using a pre-trained complex architecture on a large dataset to learn the language and next fine-tune its weights to implement…

Computation and Language · Computer Science 2025-09-04 Yarden Tzach , Ronit D. Gross , Ella Koresh , Shalom Rosner , Or Shpringer , Tal Halevi , Ido Kanter

Command-line Obfuscation Detection using Small Language Models

To avoid detection, adversaries often use command-line obfuscation. There are numerous techniques of the command-line obfuscation, all designed to alter the command-line syntax without affecting its original functionality. This variability…

Cryptography and Security · Computer Science 2024-08-06 Vojtech Outrata , Michael Adam Polak , Martin Kopp

NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System

We present new data and semantic parsing methods for the problem of mapping English sentences to Bash commands (NL2Bash). Our long-term goal is to enable any user to perform operations such as file manipulation, search, and…

Computation and Language · Computer Science 2018-03-05 Xi Victoria Lin , Chenglong Wang , Luke Zettlemoyer , Michael D. Ernst

Natural Language Processing for Dialects of a Language: A Survey

State-of-the-art natural language processing (NLP) models are trained on massive training corpora, and report a superlative performance on evaluation datasets. This survey delves into an important attribute of these datasets: the dialect of…

Computation and Language · Computer Science 2024-12-10 Aditya Joshi , Raj Dabre , Diptesh Kanojia , Zhuang Li , Haolan Zhan , Gholamreza Haffari , Doris Dippold

Machine Learning (ML) library in Linux kernel

Linux kernel is a huge code base with enormous number of subsystems and possible configuration options that results in unmanageable complexity of elaborating an efficient configuration. Machine Learning (ML) is approach/area of learning…

Machine Learning · Computer Science 2026-03-03 Viacheslav Dubeyko

Natural Language Processing: A Comprehensive Practical Guide from Tokenisation to RLHF

This preprint presents a systematic, research-oriented practicum that guides the reader through the entire modern NLP pipeline: from tokenisation and vectorisation to fine-tuning of large language models, retrieval-augmented generation, and…

Computation and Language · Computer Science 2026-05-12 Mullosharaf K. Arabov

Security Vulnerability Detection Using Deep Learning Natural Language Processing

Detecting security vulnerabilities in software before they are exploited has been a challenging problem for decades. Traditional code analysis methods have been proposed, but are often ineffective and inefficient. In this work, we model…

Cryptography and Security · Computer Science 2021-05-07 Noah Ziems , Shaoen Wu