Related papers: RedCodeAgent: Automatic Red-teaming Agent against …

BlueCodeAgent: A Blue Teaming Agent Enabled by Automated Red Teaming for CodeGen AI

As large language models (LLMs) are increasingly used for code generation, concerns over the security risks have grown substantially. Early research has primarily focused on red teaming, which aims to uncover and evaluate vulnerabilities…

Software Engineering · Computer Science 2025-10-22 Chengquan Guo , Yuzhou Nie , Chulin Xie , Zinan Lin , Wenbo Guo , Bo Li

RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent

Recently, advanced Large Language Models (LLMs) such as GPT-4 have been integrated into many real-world applications like Code Copilot. These applications have significantly expanded the attack surface of LLMs, exposing them to a variety of…

Cryptography and Security · Computer Science 2024-07-24 Huiyu Xu , Wenhui Zhang , Zhibo Wang , Feng Xiao , Rui Zheng , Yunhe Feng , Zhongjie Ba , Kui Ren

RedCoder: Automated Multi-Turn Red Teaming for Code LLMs

Large Language Models (LLMs) for code generation (i.e., Code LLMs) have demonstrated impressive capabilities in AI-assisted software development and testing. However, recent studies have shown that these models are prone to generating…

Software Engineering · Computer Science 2025-07-31 Wenjie Jacky Mo , Qin Liu , Xiaofei Wen , Dongwon Jung , Hadi Askari , Wenxuan Zhou , Zhe Zhao , Muhao Chen

RedCode: Risky Code Execution and Generation Benchmark for Code Agents

With the rapidly increasing capabilities and adoption of code agents for AI-assisted coding, safety concerns, such as generating or executing risky code, have become significant barriers to the real-world deployment of these agents. To…

Software Engineering · Computer Science 2024-11-13 Chengquan Guo , Xun Liu , Chulin Xie , Andy Zhou , Yi Zeng , Zinan Lin , Dawn Song , Bo Li

AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

As large language models (LLMs) become increasingly capable, security and safety evaluation are crucial. While current red teaming approaches have made strides in assessing LLM vulnerabilities, they often rely heavily on human input and…

Cryptography and Security · Computer Science 2025-03-21 Andy Zhou , Kevin Wu , Francesco Pinto , Zhaorun Chen , Yi Zeng , Yu Yang , Shuang Yang , Sanmi Koyejo , James Zou , Bo Li

DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

AI agents are increasingly deployed across diverse domains to automate complex workflows through long-horizon and high-stakes action executions. Due to their high capability and flexibility, such agents raise significant security and safety…

Artificial Intelligence · Computer Science 2026-05-07 Zhaorun Chen , Xun Liu , Haibo Tong , Chengquan Guo , Yuzhou Nie , Jiawei Zhang , Mintong Kang , Chejian Xu , Qichang Liu , Xiaogeng Liu , Tianneng Shi , Chaowei Xiao , Sanmi Koyejo , Percy Liang , Wenbo Guo , Dawn Song , Bo Li

AdvAgent: Controllable Blackbox Red-teaming on Web Agents

Foundation model-based agents are increasingly used to automate complex tasks, enhancing efficiency and productivity. However, their access to sensitive resources and autonomous decision-making also introduce significant security risks,…

Cryptography and Security · Computer Science 2025-06-03 Chejian Xu , Mintong Kang , Jiawei Zhang , Zeyi Liao , Lingbo Mo , Mengqi Yuan , Huan Sun , Bo Li

Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents

Large language models (LLMs) have shown promise in assisting cybersecurity tasks, yet existing approaches struggle with automatic vulnerability discovery and exploitation due to limited interaction, weak execution grounding, and a lack of…

Machine Learning · Computer Science 2026-02-05 Pengfei He , Ash Fox , Lesly Miculicich , Stefan Friedli , Daniel Fabian , Burak Gokturk , Jiliang Tang , Chen-Yu Lee , Tomas Pfister , Long T. Le

LeakAgent: RL-based Red-teaming Agent for LLM Privacy Leakage

Recent studies have discovered that large language models (LLM) may be ``fooled'' to output private information, including training data, system prompts, and personally identifiable information, under carefully crafted adversarial prompts.…

Cryptography and Security · Computer Science 2025-08-11 Yuzhou Nie , Zhun Wang , Ye Yu , Xian Wu , Xuandong Zhao , Wenbo Guo , Dawn Song

Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security Assessment

Coding agents powered by large language models are becoming central modules of modern IDEs, helping users perform complex tasks by invoking tools. While powerful, tool invocation opens a substantial attack surface. Prior work has…

Cryptography and Security · Computer Science 2026-01-06 Yuchong Xie , Mingyu Luo , Zesen Liu , Zhixiang Zhang , Kaikai Zhang , Yu Liu , Zongjie Li , Ping Chen , Shuai Wang , Dongdong She

A Systematic Review of Algorithmic Red Teaming Methodologies for Assurance and Security of AI Applications

Cybersecurity threats are becoming increasingly sophisticated, making traditional defense mechanisms and manual red teaming approaches insufficient for modern organizations. While red teaming has long been recognized as an effective method…

Cryptography and Security · Computer Science 2026-02-26 Shruti Srivastava , Kiranmayee Janardhan , Shaurya Jauhari

RedDebate: Safer Responses through Multi-Agent Red Teaming Debates

We introduce RedDebate, a novel multi-agent debate framework that provides the foundation for Large Language Models (LLMs) to identify and mitigate their unsafe behaviours. Existing AI safety approaches often rely on costly human evaluation…

Computation and Language · Computer Science 2025-10-13 Ali Asad , Stephen Obadinma , Radin Shayanfar , Xiaodan Zhu

ResearchCodeAgent: An LLM Multi-Agent System for Automated Codification of Research Methodologies

In this paper we introduce ResearchCodeAgent, a novel multi-agent system leveraging large language models (LLMs) agents to automate the codification of research methodologies described in machine learning literature. The system bridges the…

Software Engineering · Computer Science 2025-05-06 Shubham Gandhi , Dhruv Shah , Manasi Patwardhan , Lovekesh Vig , Gautam Shroff

RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments

Computer-use agents (CUAs) promise to automate complex tasks across operating systems (OS) and the web, but remain vulnerable to indirect prompt injection. Current evaluations of this threat either lack support realistic but controlled…

Computation and Language · Computer Science 2026-03-03 Zeyi Liao , Jaylen Jones , Linxi Jiang , Yuting Ning , Eric Fosler-Lussier , Yu Su , Zhiqiang Lin , Huan Sun

MonitoringBench: Semi-Automated Red-Teaming for Agent Monitoring

We introduce a red-teaming methodology that exposes harder-to-catch attacks for coding-agent monitors, suggesting that current practices may under-elicit attacks and overstate monitor performance. We identify three challenges with current…

Cryptography and Security · Computer Science 2026-05-12 Monika Jotautaitė , Maria Angelica Martinez , Ollie Matthews , Tyler Tracy

RedTeamLLM: an Agentic AI framework for offensive security

From automated intrusion testing to discovery of zero-day attacks before software launch, agentic AI calls for great promises in security engineering. This strong capability is bound with a similar threat: the security and research…

Cryptography and Security · Computer Science 2025-05-13 Brian Challita , Pierre Parrend

Kaleidoscopic Teaming in Multi Agent Simulations

Warning: This paper contains content that may be inappropriate or offensive. AI agents have gained significant recent attention due to their autonomous tool usage capabilities and their integration in various real-world applications. This…

Artificial Intelligence · Computer Science 2025-06-24 Ninareh Mehrabi , Tharindu Kumarage , Kai-Wei Chang , Aram Galstyan , Rahul Gupta

SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement

LLM-based agent systems increasingly rely on agent skills sourced from open registries to extend their capabilities, yet the openness of such ecosystems makes skills difficult to thoroughly vet. Existing attacks rely on injecting malicious…

Cryptography and Security · Computer Science 2026-04-08 Zenghao Duan , Yuxin Tian , Zhiyi Yin , Liang Pang , Jingcheng Deng , Zihao Wei , Shicheng Xu , Yuyao Ge , Xueqi Cheng

TreeTeaming: Autonomous Red-Teaming of Vision-Language Models via Hierarchical Strategy Exploration

The rapid advancement of Vision-Language Models (VLMs) has brought their safety vulnerabilities into sharp focus. However, existing red teaming methods are fundamentally constrained by an inherent linear exploration paradigm, confining them…

Machine Learning · Computer Science 2026-03-25 Chunxiao Li , Lijun Li , Jing Shao

AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing

Recent advancements in automatic code generation using large language models (LLMs) have brought us closer to fully automated secure software development. However, existing approaches often rely on a single agent for code generation, which…

Software Engineering · Computer Science 2024-11-06 Ana Nunez , Nafis Tanveer Islam , Sumit Kumar Jha , Peyman Najafirad