Andrey Labunets
We take the position that agent security must be approached as a systems problem: the AI model powering the agent must be treated as an untrusted component, and security invariants must be enforced at the system level. Through this lens,…
In recent years, agentic artificial intelligence (AI) systems are becoming increasingly widespread. These systems allow agents to use various tools, such as web browsers, compilers, and more. However, despite their popularity, agentic AI…
A popular class of defenses against prompt injection attacks on large language models (LLMs) relies on fine-tuning to separate instructions and data, so that the LLM does not follow instructions that might be present with data. We evaluate…
We surface a new threat to closed-weight Large Language Models (LLMs) that enables an attacker to compute optimization-based prompt injections. Specifically, we characterize how an attacker can leverage the loss-like information returned…
Government investigatory and surveillance powers are important tools for examining crime and protecting public safety. However, since these tools must be employed in secret, it can be challenging to identify abuses or changes in use that…
Content scanning systems employ perceptual hashing algorithms to scan user content for illegal material, such as child pornography or terrorist recruitment flyers. Perceptual hashing algorithms help determine whether two images are visually…
With the growth of social media platform influence, the effect of their misuse becomes more and more impactful. The importance of automatic detection of threatening and abusive language can not be overestimated. However, most of the…