English

LLMs for Domain Generation Algorithm Detection

Computation and Language 2024-11-06 v1 Cryptography and Security

Abstract

This work analyzes the use of large language models (LLMs) for detecting domain generation algorithms (DGAs). We perform a detailed evaluation of two important techniques: In-Context Learning (ICL) and Supervised Fine-Tuning (SFT), showing how they can improve detection. SFT increases performance by using domain-specific data, whereas ICL helps the detection model to quickly adapt to new threats without requiring much retraining. We use Meta's Llama3 8B model, on a custom dataset with 68 malware families and normal domains, covering several hard-to-detect schemes, including recent word-based DGAs. Results proved that LLM-based methods can achieve competitive results in DGA detection. In particular, the SFT-based LLM DGA detector outperforms state-of-the-art models using attention layers, achieving 94% accuracy with a 4% false positive rate (FPR) and excelling at detecting word-based DGA domains.

Keywords

Cite

@article{arxiv.2411.03307,
  title  = {LLMs for Domain Generation Algorithm Detection},
  author = {Reynier Leyva La O and Carlos A. Catania and Tatiana Parlanti},
  journal= {arXiv preprint arXiv:2411.03307},
  year   = {2024}
}
R2 v1 2026-06-28T19:49:15.879Z