Neil Perry — Scifaro

How Vulnerable Are AI Agents to Indirect Prompt Injections? Insights from a Large-Scale Public Competition

LLM based agents are increasingly deployed in high stakes settings where they process external data sources such as emails, documents, and code repositories. This creates exposure to indirect prompt injection attacks, where adversarial…

Cryptography and Security · Computer Science 2026-03-18 Mateusz Dziemian , Maxwell Lin , Xiaohan Fu , Micha Nowak , Nick Winter , Eliot Jones , Andy Zou , Lama Ahmad , Kamalika Chaudhuri , Sahana Chennabasappa , Xander Davies , Lauren Deason , Benjamin L. Edelman , Tanner Emek , Ivan Evtimov , Jim Gust , Maia Hamin , Kat He , Klaudia Krawiecka , Riccardo Patana , Neil Perry , Troy Peterson , Xiangyu Qi , Javier Rando , Zifan Wang , Zihan Wang , Spencer Whitman , Eric Winsor , Arman Zharmagambetov , Matt Fredrikson , Zico Kolter

Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent…

Artificial Intelligence · Computer Science 2026-03-04 Justin W. Lin , Eliot Krzysztof Jones , Donovan Julian Jasper , Ethan Jun-shen Ho , Anna Wu , Arnold Tianyi Yang , Neil Perry , Andy Zou , Matt Fredrikson , J. Zico Kolter , Percy Liang , Dan Boneh , Daniel E. Ho

Cryptographic Data Exchange for Nuclear Warheads

Nuclear arms control treaties have historically focused on strategic nuclear delivery systems, indirectly restricting strategic nuclear warhead numbers and leaving nonstrategic nuclear warheads (NSNWs) outside formal verification…

Cryptography and Security · Computer Science 2025-09-05 Neil Perry , Daniil Zhukov

Robust Steganography from Large Language Models

Recent steganographic schemes, starting with Meteor (CCS'21), rely on leveraging large language models (LLMs) to resolve a historically-challenging task of disguising covert communication as ``innocent-looking'' natural-language…

Cryptography and Security · Computer Science 2025-04-15 Neil Perry , Sanket Gupte , Nishant Pitta , Lior Rotem

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models

Language Model (LM) agents for cybersecurity that are capable of autonomously identifying vulnerabilities and executing exploits have potential to cause real-world impact. Policymakers, model providers, and researchers in the AI and…

Cryptography and Security · Computer Science 2025-04-15 Andy K. Zhang , Neil Perry , Riya Dulepet , Joey Ji , Celeste Menders , Justin W. Lin , Eliot Jones , Gashon Hussein , Samantha Liu , Donovan Jasper , Pura Peetathawatchai , Ari Glenn , Vikram Sivashankar , Daniel Zamoshchin , Leo Glikbarg , Derek Askaryar , Mike Yang , Teddy Zhang , Rishi Alluri , Nathan Tran , Rinnara Sangpisit , Polycarpos Yiorkadjis , Kenny Osele , Gautham Raghupathi , Dan Boneh , Daniel E. Ho , Percy Liang

Do Users Write More Insecure Code with AI Assistants?

We conduct the first large-scale user study examining how users interact with an AI Code assistant to solve a variety of security related tasks across different programming languages. Overall, we find that participants who had access to an…

Cryptography and Security · Computer Science 2023-12-19 Neil Perry , Megha Srivastava , Deepak Kumar , Dan Boneh

Strong Anonymity for Mesh Messaging

Messaging systems built on mesh networks consisting of smartphones communicating over Bluetooth have been used by protesters around the world after governments have disrupted Internet connectivity. Unfortunately, existing systems have been…

Cryptography and Security · Computer Science 2022-08-24 Neil Perry , Bruce Spang , Saba Eskandarian , Dan Boneh