English

Classifying Web Exploits with Topic Modeling

Cryptography and Security 2017-10-17 v1 Information Retrieval Software Engineering

Abstract

This short empirical paper investigates how well topic modeling and database meta-data characteristics can classify web and other proof-of-concept (PoC) exploits for publicly disclosed software vulnerabilities. By using a dataset comprised of over 36 thousand PoC exploits, near a 0.9 accuracy rate is obtained in the empirical experiment. Text mining and topic modeling are a significant boost factor behind this classification performance. In addition to these empirical results, the paper contributes to the research tradition of enhancing software vulnerability information with text mining, providing also a few scholarly observations about the potential for semi-automatic classification of exploits in the existing tracking infrastructures.

Keywords

Cite

@article{arxiv.1710.05561,
  title  = {Classifying Web Exploits with Topic Modeling},
  author = {Jukka Ruohonen},
  journal= {arXiv preprint arXiv:1710.05561},
  year   = {2017}
}

Comments

Proceedings of the 2017 28th International Workshop on Database and Expert Systems Applications (DEXA). http://ieeexplore.ieee.org/abstract/document/8049693/

R2 v1 2026-06-22T22:14:37.761Z