English

Shell Language Processing: Unix command parsing for Machine Learning

Machine Learning 2022-07-08 v3 Programming Languages

Abstract

In this article, we present a Shell Language Preprocessing (SLP) library, which implements tokenization and encoding directed at parsing Unix and Linux shell commands. We describe the rationale behind the need for a new approach with specific examples of when conventional Natural Language Processing (NLP) pipelines fail. Furthermore, we evaluate our methodology on a security classification task against widely accepted information and communications technology (ICT) tokenization techniques and achieve significant improvement of an F1 score from 0.392 to 0.874.

Keywords

Cite

@article{arxiv.2107.02438,
  title  = {Shell Language Processing: Unix command parsing for Machine Learning},
  author = {Dmitrijs Trizna},
  journal= {arXiv preprint arXiv:2107.02438},
  year   = {2022}
}

Comments

4 pages, 1 table

R2 v1 2026-06-24T03:55:20.795Z