HomeArtificial IntelligencearXiv:2605.29801

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Abstract

Modern open-world agents such as OpenClaw exhibit powerful cross-environment execution capabilities yet introduce broad new safety risk sources. Meanwhile, advanced frontier AI models drastically lower attack barriers, rendering current agent alignment frameworks inadequate for real-world deployment. To tackle these emerging threats, we propose a lightweight and scalable agent safety alignment framework. Specifically, we update the agent safety taxonomy to accommodate emergent risks from Codex and OpenClaw execution scenarios. We further build a taxonomy-guided data engine with influence-function purification to train lightweight AgentDoG 1.5 variants (0.8B, 2B, 4B, and 8B parameters) using only around 1k samples, achieving comparable performance with leading closed-source models (e.g., GPT-5.4). Based on AgentDoG 1.5, we construct a highly efficient agentic safety SFT and RL training environment, which reduces deployment overhead in Docker-level environments by two orders of magnitude. Finally, we deploy AgentDoG 1.5 as a training-free online guardrail for real-time safety moderation. Extensive experimental results indicate that AgentDoG 1.5 achieves state-of-the-art performance in diverse and complex interactive agentic scenarios. All models and datasets are openly released.

Comments: 44 pages, 12 Figures, 9 Tables

Cite

@article{arxiv.2605.29801,
  title  = {AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security},
  author = {Dongrui Liu and Yu Li and Zhonghao Yang and Peng Wang and Guanxu Chen and Yuejin Xie and Qinghua Mao and Wanying Qu and Yanxu Zhu and Tianyi Zhou and Leitao Yuan and Zhijie Zheng and Qihao Lin and Yimin Wang and Haoyu Luo and Shuai Shao and Chen Qian and Qingyu Liu and Ling Tang and Ruiyang Qin and Qihan Ren and Junxiao Yang and Kun Wang and Zhiheng Xi and Linfeng Zhang and Ranjie Duan and Bo Zhang and Wenjie Wang and Wen Shen and Qiaosheng Zhang and Yan Teng and Chaochao Lu and Rui Mei and Man Li and Jialing Tao and Xi Lin and Tianhang Zheng and Yong Liu and Quanshi Zhang and Lei Zhu and Xingjun Ma and Junhua Liu and Hui Xue and Xiaoxiang Zuo and Xiangnan He and Chao Shen and Xianglong Liu and Minlie Huang and Jing Shao and Xia Hu},
  journal= {arXiv preprint arXiv:2605.29801},
  year   = {2026}
}