English

Cyber Knowledge Completion Using Large Language Models

Cryptography and Security 2024-09-25 v1 Artificial Intelligence

Abstract

The integration of the Internet of Things (IoT) into Cyber-Physical Systems (CPSs) has expanded their cyber-attack surface, introducing new and sophisticated threats with potential to exploit emerging vulnerabilities. Assessing the risks of CPSs is increasingly difficult due to incomplete and outdated cybersecurity knowledge. This highlights the urgent need for better-informed risk assessments and mitigation strategies. While previous efforts have relied on rule-based natural language processing (NLP) tools to map vulnerabilities, weaknesses, and attack patterns, recent advancements in Large Language Models (LLMs) present a unique opportunity to enhance cyber-attack knowledge completion through improved reasoning, inference, and summarization capabilities. We apply embedding models to encapsulate information on attack patterns and adversarial techniques, generating mappings between them using vector embeddings. Additionally, we propose a Retrieval-Augmented Generation (RAG)-based approach that leverages pre-trained models to create structured mappings between different taxonomies of threat patterns. Further, we use a small hand-labeled dataset to compare the proposed RAG-based approach to a baseline standard binary classification model. Thus, the proposed approach provides a comprehensive framework to address the challenge of cyber-attack knowledge graph completion.

Keywords

Cite

@article{arxiv.2409.16176,
  title  = {Cyber Knowledge Completion Using Large Language Models},
  author = {Braden K Webb and Sumit Purohit and Rounak Meyur},
  journal= {arXiv preprint arXiv:2409.16176},
  year   = {2024}
}

Comments

7 pages, 2 figures. Submitted to 2024 IEEE International Conference on Big Data

R2 v1 2026-06-28T18:55:27.264Z