Related papers: GradingAttack: Exposing Security Vulnerabilities i…

Evaluating Answer Leakage Robustness of LLM Tutors against Adversarial Student Attacks

Large Language Models (LLMs) are increasingly used in education, yet their default helpfulness often conflicts with pedagogical principles. Prior work evaluates pedagogical quality via answer leakage-the disclosure of complete solutions…

Cryptography and Security · Computer Science 2026-04-22 Jin Zhao , Marta Knežević , Tanja Käser

Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks

Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as they integrate more deeply into complex systems, the urgency to scrutinize their security properties grows. This paper surveys research in the…

Computation and Language · Computer Science 2023-10-18 Erfan Shayegani , Md Abdullah Al Mamun , Yu Fu , Pedram Zaree , Yue Dong , Nael Abu-Ghazaleh

A Survey of Attacks on Large Language Models

Large language models (LLMs) and LLM-based agents have been widely deployed in a wide range of applications in the real world, including healthcare diagnostics, financial analysis, customer support, robotics, and autonomous driving,…

Cryptography and Security · Computer Science 2025-05-20 Wenrui Xu , Keshab K. Parhi

TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems

Large Language Models (LLMs) have demonstrated strong capabilities as autonomous agents through tool use, planning, and decision-making abilities, leading to their widespread adoption across diverse tasks. As task complexity grows,…

Multiagent Systems · Computer Science 2025-11-10 Ishan Kavathekar , Hemang Jain , Ameya Rathod , Ponnurangam Kumaraguru , Tanuja Ganu

Your Agent Can Defend Itself against Backdoor Attacks

Despite their growing adoption across domains, large language model (LLM)-powered agents face significant security risks from backdoor attacks during training and fine-tuning. These compromised agents can subsequently be manipulated to…

Cryptography and Security · Computer Science 2025-06-12 Li Changjiang , Liang Jiacheng , Cao Bochuan , Chen Jinghui , Wang Ting

Adversarial Reinforcement Learning for Large Language Model Agent Safety

Large Language Model (LLM) agents can leverage tools such as Google Search to complete complex tasks. However, this tool usage introduces the risk of indirect prompt injections, where malicious instructions hidden in tool outputs can…

Machine Learning · Computer Science 2025-10-08 Zizhao Wang , Dingcheng Li , Vaishakh Keshava , Phillip Wallis , Ananth Balashankar , Peter Stone , Lukas Rutishauser

Towards LLM-based Autograding for Short Textual Answers

Grading exams is an important, labor-intensive, subjective, repetitive, and frequently challenging task. The feasibility of autograding textual responses has greatly increased thanks to the availability of large language models (LLMs) such…

Computation and Language · Computer Science 2024-07-09 Johannes Schneider , Bernd Schenk , Christina Niklaus

Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach

This paper presents a novel approach to evaluating the security of large language models (LLMs) against prompt leakage-the exposure of system-level prompts or proprietary configurations. We define prompt leakage as a critical threat to…

Cryptography and Security · Computer Science 2025-02-19 Tvrtko Sternak , Davor Runje , Dorian Granoša , Chi Wang

Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges

Agentic AI systems powered by large language models (LLMs) and endowed with planning, tool use, memory, and autonomy, are emerging as powerful, flexible platforms for automation. Their ability to autonomously execute tasks across web,…

Artificial Intelligence · Computer Science 2026-04-07 Anshuman Chhabra , Shrestha Datta , Shahriar Kabir Nahin , Prasant Mohapatra

ATAG: AI-Agent Application Threat Assessment with Attack Graphs

Evaluating the security of multi-agent systems (MASs) powered by large language models (LLMs) is challenging, primarily because of the systems' complex internal dynamics and the evolving nature of LLM vulnerabilities. Traditional attack…

Cryptography and Security · Computer Science 2025-06-04 Parth Atulbhai Gandhi , Akansha Shukla , David Tayouri , Beni Ifland , Yuval Elovici , Rami Puzis , Asaf Shabtai

How to Trick Your AI TA: A Systematic Study of Academic Jailbreaking in LLM Code Evaluation

The use of Large Language Models (LLMs) as automatic judges for code evaluation is becoming increasingly prevalent in academic environments. But their reliability can be compromised by students who may employ adversarial prompting…

Software Engineering · Computer Science 2026-02-04 Devanshu Sahoo , Vasudev Majhi , Arjun Neekhra , Yash Sinha , Murari Mandal , Dhruv Kumar

Agent Security Bench (ASB): Formalizing and Benchmarking Attacks and Defenses in LLM-based Agents

Although LLM-based agents, powered by Large Language Models (LLMs), can use external tools and memory mechanisms to solve complex real-world tasks, they may also introduce critical security vulnerabilities. However, the existing literature…

Cryptography and Security · Computer Science 2025-06-02 Hanrong Zhang , Jingyuan Huang , Kai Mei , Yifei Yao , Zhenting Wang , Chenlu Zhan , Hongwei Wang , Yongfeng Zhang

From Secure Agentic AI to Secure Agentic Web: Challenges, Threats, and Future Directions

Large Language Models (LLMs) are increasingly deployed as agentic systems that plan, memorize, and act in open-world environments. This shift brings new security problems: failures are no longer only unsafe text generation, but can become…

Cryptography and Security · Computer Science 2026-03-03 Zhihang Deng , Jiaping Gui , Weinan Zhang

Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security

Agentic AI systems -- Large Language Models (LLMs) augmented with planning, tool use, memory, and long-horizon interactions -- can execute complex tasks autonomously, but their multi-step trajectories introduce new failure modes that…

Artificial Intelligence · Computer Science 2026-05-26 Jinhu Qi , Muzhi Li , Jiahong Liu , Yuqin Shu , Dianzhi Yu , Shicheng Ma , Wenqian Cui , Yiyang Zhao , Yiyi Chen , Ruoxi Jiang , Irwin King , Zenglin Xu

The Dark Side of LLMs: Agent-based Attack Vectors for System-level Compromise

The rapid adoption of Large Language Model (LLM) agents and multi-agent systems enables remarkable capabilities in natural language processing and generation. However, these systems introduce security vulnerabilities that extend beyond…

Cryptography and Security · Computer Science 2026-05-12 Matteo Lupinacci , Francesco Aurelio Pironti , Francesco Blefari , Francesco Romeo , Luigi Arena , Angelo Furfaro

Security Attack and Defense Strategies for Autonomous Agent Frameworks: A Layered Review with OpenClaw as a Case Study

Autonomous agent frameworks built upon large language models (LLMs) are evolving into complex, tool-integrated, and continuously operating systems, introducing security risks beyond traditional prompt-level vulnerabilities. As this paradigm…

Cryptography and Security · Computer Science 2026-05-01 Luyao Xu , Xiang Chen

Confidence Estimation in Automatic Short Answer Grading with LLMs

Automatic Short Answer Grading (ASAG) with generative large language models (LLMs) has recently demonstrated strong performance without task-specific fine-tuning, while also enabling the generation of synthetic feedback for educational…

Computation and Language · Computer Science 2026-05-14 Longwei Cong , Sonja Hahn , Sebastian Gombert , Leon Camus , Hendrik Drachsler , Ulf Kroehne

Exploring Vulnerabilities and Protections in Large Language Models: A Survey

As Large Language Models (LLMs) increasingly become key components in various AI applications, understanding their security vulnerabilities and the effectiveness of defense mechanisms is crucial. This survey examines the security challenges…

Machine Learning · Computer Science 2024-06-04 Frank Weizhen Liu , Chenhui Hu

An LLM can Fool Itself: A Prompt-Based Adversarial Attack

The wide-ranging applications of large language models (LLMs), especially in safety-critical domains, necessitate the proper evaluation of the LLM's adversarial robustness. This paper proposes an efficient tool to audit the LLM's…

Cryptography and Security · Computer Science 2023-10-23 Xilie Xu , Keyi Kong , Ning Liu , Lizhen Cui , Di Wang , Jingfeng Zhang , Mohan Kankanhalli

garak: A Framework for Security Probing Large Language Models

As Large Language Models (LLMs) are deployed and integrated into thousands of applications, the need for scalable evaluation of how models respond to adversarial attacks grows rapidly. However, LLM security is a moving target: models…

Computation and Language · Computer Science 2024-06-18 Leon Derczynski , Erick Galinkin , Jeffrey Martin , Subho Majumdar , Nanna Inie