Related papers: Do Coding Agents Understand Least-Privilege Author…

No More, No Less: Least-Privilege Language Models

Least privilege is a core security principle: grant each request only the minimum access needed to achieve its goal. Deployed language models almost never follow it, instead being exposed through a single API endpoint that serves all users…

Cryptography and Security · Computer Science 2026-03-05 Paulius Rauba , Dominykas Seputis , Patrikas Vanagas , Mihaela van der Schaar

Authorization Propagation in Multi-Agent AI Systems: Identity Governance as Infrastructure

The security discussion around agentic AI focuses heavily on prompt injection. This paper argues that multi-agent systems also create a distinct authorization problem: maintaining authorization invariants as non-human principals retrieve…

Artificial Intelligence · Computer Science 2026-05-08 Krti Tallam

The Authorization Policy Existence Problem

Constraints such as separation-of-duty are widely used to specify requirements that supplement basic authorization policies. However, the existence of constraints (and authorization policies) may mean that a user is unable to fulfill…

Cryptography and Security · Computer Science 2016-12-20 Pierre Bergé , Jason Crampton , Gregory Gutin , Rémi Watrigant

Overeager Coding Agents: Measuring Out-of-Scope Actions on Benign Tasks

Coding agents now run autonomously with shell, file, and network privileges. When a user issues a benign request, the agent sometimes does more than asked: it deletes unrelated files, wipes a stale credentials backup, or rewrites…

Software Engineering · Computer Science 2026-05-19 Yubin Qu , Ying Zhang , Yanjun Zhang , Gelei Deng , Yuekang Li , Leo Yu Zhang , Yi Liu

The Open-Box Fallacy: Why AI Deployment Needs a Calibrated Verification Regime

AI deployment in sensitive domains such as health care, credit, employment, and criminal justice is often treated as unsafe to authorize until model internals can be explained. This often leads to an excessive reliance on mechanistic…

Artificial Intelligence · Computer Science 2026-05-12 Phongsakon Mark Konrad , Tim Lukas Adam , Ane Cathrine Holst Merrild , Riccardo Terrenzi , Rebecca De Rosa , Toygar Tanyel , Serkan Ayvaz

FORTIS: Benchmarking Over-Privilege in Agent Skills

Large language model agents increasingly operate through an intermediate skill layer that mediates between user intent and concrete task execution. This layer is widely treated as an organizational abstraction, but we argue it is also a…

Artificial Intelligence · Computer Science 2026-05-14 Shawn Li , Chenxiao Yu , Han Wang , Wei Yang , Ryan Rossi , Franck Dernoncourt , Xiyang Hu , Philip Yu , Chaowei Xiao , Huan Zhang , Yue Zhao

Towards Automating Data Access Permissions in AI Agents

As AI agents attempt to autonomously act on users' behalf, they raise transparency and control issues. We argue that permission-based access control is indispensable in providing meaningful control to the users, but conventional permission…

Cryptography and Security · Computer Science 2025-11-25 Yuhao Wu , Ke Yang , Franziska Roesner , Tadayoshi Kohno , Ning Zhang , Umar Iqbal

User Authorization in a System with a Role-Based Access Control on the Basis of the Analytic Hierarchy Process

The problem of optimal authorization of a user in a system with a role-based access control policy is considered. The main criterion is to minimize the risks of permission leakage. The choice of the role for authorization is based on the…

Cryptography and Security · Computer Science 2018-12-21 S. V. Belim , S. Yu. Belim , N. F. Bogachenko , A. N. Kabanov

Evaluating Language Model Reasoning about Confidential Information

As language models are increasingly deployed as autonomous agents in high-stakes settings, ensuring that they reliably follow user-defined rules has become a critical safety concern. To this end, we study whether language models exhibit…

Machine Learning · Computer Science 2025-08-28 Dylan Sam , Alexander Robey , Andy Zou , Matt Fredrikson , J. Zico Kolter

SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills

Agent Skills have become a practical way to extend LLM agents by packaging metadata, natural-language instructions, and executable resources into reusable capability bundles. However, this growing Skill ecosystem introduces a new compliance…

Cryptography and Security · Computer Science 2026-05-08 Jiangrong Wu , Yuhong Nan , Yixi Lin , Huaijin Wang , Yuming Xiao , Shuai Wang , Zibin Zheng

Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs

Autonomous code agents built on large language models are reshaping software and AI development through tool use, long-horizon reasoning, and self-directed interaction. However, this autonomy introduces a previously unrecognized security…

Artificial Intelligence · Computer Science 2026-01-30 Xiang Zheng , Yutao Wu , Hanxun Huang , Yige Li , Xingjun Ma , Bo Li , Yu-Gang Jiang , Cong Wang

Evaluating and Understanding Scheming Propensity in LLM Agents

As frontier language models are increasingly deployed as autonomous agents pursuing complex, long-term objectives, there is increased risk of scheming: agents covertly pursuing misaligned goals. Prior work has focused on showing agents are…

Artificial Intelligence · Computer Science 2026-03-31 Mia Hopman , Jannes Elstner , Maria Avramidou , Amritanshu Prasad , David Lindner

PropensityBench: Evaluating Latent Safety Risks in Large Language Models via an Agentic Approach

Recent advances in Large Language Models (LLMs) have sparked concerns over their potential to acquire and misuse dangerous or high-risk capabilities, posing frontier risks. Current safety evaluations primarily test for what a model…

Computers and Society · Computer Science 2025-11-27 Udari Madhushani Sehwag , Shayan Shabihi , Alex McAvoy , Vikash Sehwag , Yuancheng Xu , Dalton Towers , Furong Huang

Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents

Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety failures from unsafe text to unsafe *trajectories*. We introduce **AgentFence**, an…

Cryptography and Security · Computer Science 2026-02-10 Sai Puppala , Ismail Hossain , Md Jahangir Alam , Yoonpyo Lee , Jay Yoo , Tanzim Ahad , Syed Bahauddin Alam , Sajedul Talukder

AgentFloor: How Far Up the tool use Ladder Can Small Open-Weight Models Go?

Production agentic systems make many model calls per user request, and most of those calls are short, structured, and routine. This raises a practical routing question that existing evaluations do not directly answer: which parts of an…

Artificial Intelligence · Computer Science 2026-05-04 Ranit Karmakar , Jayita Chatterjee

The Design and Demonstration of an Actor-Based, Application-Aware Access Control Evaluation Framework

To date, most work regarding the formal analysis of access control schemes has focused on quantifying and comparing the expressive power of a set of schemes. Although expressive power is important, it is a property that exists in an…

Cryptography and Security · Computer Science 2013-02-06 William C. Garrison , Adam J. Lee , Timothy L. Hinrichs

The Hierarchy of Agentic Capabilities: Evaluating Frontier Models on Realistic RL Environments

The advancement of large language model (LLM) based agents has shifted AI evaluation from single-turn response assessment to multi-step task completion in interactive environments. We present an empirical study evaluating frontier AI models…

Artificial Intelligence · Computer Science 2026-01-15 Logan Ritchie , Sushant Mehta , Nick Heiner , Mason Yu , Edwin Chen

Frontier Models Can Take Actions at Low Probabilities

Pre-deployment evaluations inspect only a limited sample of model actions. A malicious model seeking to evade oversight could exploit this by randomizing when to "defect": misbehaving so rarely that no malicious actions are observed during…

Machine Learning · Computer Science 2026-03-03 Alex Serrano , Wen Xing , David Lindner , Erik Jenner

Adversaries Can Misuse Combinations of Safe Models

Developers try to evaluate whether an AI system can be misused by adversaries before releasing it; for example, they might test whether a model enables cyberoffense, user manipulation, or bioterrorism. In this work, we show that…

Cryptography and Security · Computer Science 2024-07-03 Erik Jones , Anca Dragan , Jacob Steinhardt

Exploration and Exploitation Errors Are Measurable for Language Model Agents

Language Model (LM) agents are increasingly used in complex open-ended decision-making tasks, from AI coding to physical AI. A core requirement in these settings is the ability to both explore the problem space and exploit acquired…

Artificial Intelligence · Computer Science 2026-04-16 Jaden Park , Jungtaek Kim , Jongwon Jeong , Robert D. Nowak , Kangwook Lee , Yong Jae Lee