Related papers: Embedding-based classifiers can detect prompt inje…

Improved Large Language Model Jailbreak Detection via Pretrained Embeddings

The adoption of large language models (LLMs) in many applications, from customer service chat bots and software development assistants to more capable agentic systems necessitates research into how to secure these systems. Attacks like…

Cryptography and Security · Computer Science 2024-12-03 Erick Galinkin , Martin Sablotny

Applying Pre-trained Multilingual BERT in Embeddings for Improved Malicious Prompt Injection Attacks Detection

Large language models (LLMs) are renowned for their exceptional capabilities, and applying to a wide range of applications. However, this widespread use brings significant vulnerabilities. Also, it is well observed that there are huge gap…

Computation and Language · Computer Science 2024-09-23 Md Abdur Rahman , Hossain Shahriar , Fan Wu , Alfredo Cuzzocrea

Fine-tuned Large Language Models (LLMs): Improved Prompt Injection Attacks Detection

Large language models (LLMs) are becoming a popular tool as they have significantly advanced in their capability to tackle a wide range of language-based tasks. However, LLMs applications are highly vulnerable to prompt injection attacks,…

Computation and Language · Computer Science 2024-11-11 Md Abdur Rahman , Fan Wu , Alfredo Cuzzocrea , Sheikh Iqbal Ahamed

Automatic and Universal Prompt Injection Attacks against Large Language Models

Large Language Models (LLMs) excel in processing and generating human language, powered by their ability to interpret and follow instructions. However, their capabilities can be exploited through prompt injection attacks. These attacks…

Artificial Intelligence · Computer Science 2024-03-11 Xiaogeng Liu , Zhiyuan Yu , Yizhe Zhang , Ning Zhang , Chaowei Xiao

Detecting Prompt Injection Attacks Against Application Using Classifiers

Prompt injection attacks can compromise the security and stability of critical systems, from infrastructure to large web applications. This work curates and augments a prompt injection dataset based on the HackAPrompt Playground Submissions…

Cryptography and Security · Computer Science 2025-12-16 Safwan Shaheer , G. M. Refatul Islam , Mohammad Rafid Hamid , Md. Abrar Faiaz Khan , Md. Omar Faruk , Yaseen Nur

Privacy-Preserving Prompt Injection Detection for LLMs Using Federated Learning and Embedding-Based NLP Classification

Prompt injection attacks are an emerging threat to large language models (LLMs), enabling malicious users to manipulate outputs through carefully designed inputs. Existing detection approaches often require centralizing prompt data,…

Cryptography and Security · Computer Science 2025-11-18 Hasini Jayathilaka

Palisade -- Prompt Injection Detection Framework

The advent of Large Language Models LLMs marks a milestone in Artificial Intelligence, altering how machines comprehend and generate human language. However, LLMs are vulnerable to malicious prompt injection attacks, where crafted inputs…

Computation and Language · Computer Science 2024-10-29 Sahasra Kokkula , Somanathan R , Nandavardhan R , Aashishkumar , G Divya

Prompt-in-Content Attacks: Exploiting Uploaded Inputs to Hijack LLM Behavior

Large Language Models (LLMs) are widely deployed in applications that accept user-submitted content, such as uploaded documents or pasted text, for tasks like summarization and question answering. In this paper, we identify a new class of…

Cryptography and Security · Computer Science 2025-08-28 Zhuotao Lian , Weiyu Wang , Qingkui Zeng , Toru Nakanishi , Teruaki Kitasuka , Chunhua Su

Adversarial Prompt Injection Attack on Multimodal Large Language Models

Although multimodal large language models (MLLMs) are increasingly deployed in real-world applications, their instruction-following behavior leaves them vulnerable to prompt injection attacks. Existing prompt injection methods predominantly…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Meiwen Ding , Song Xia , Chenqi Kong , Xudong Jiang

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Large Language Models (LLMs) are increasingly being integrated into various applications. The functionalities of recent LLMs can be flexibly modulated via natural language prompts. This renders them susceptible to targeted adversarial…

Cryptography and Security · Computer Science 2023-05-08 Kai Greshake , Sahar Abdelnabi , Shailesh Mishra , Christoph Endres , Thorsten Holz , Mario Fritz

Misleading Large Language Models used (or misused) in Scientific Peer-Reviewing via Hidden Prompt-Injection Attacks

Large Language Models (LLMs) are increasingly being integrated into the scientific peer-review process, raising new questions about their reliability and resilience to manipulation. In this work, we investigate the potential for hidden…

Cryptography and Security · Computer Science 2026-03-31 Matteo Gioele Collu , Umberto Salviati , Roberto Confalonieri , Mauro Conti , Giovanni Apruzzese

Multimodal Prompt Injection Attacks: Risks and Defenses for Modern LLMs

Large Language Models (LLMs) have seen rapid adoption in recent years, with industries increasingly relying on them to maintain a competitive advantage. These models excel at interpreting user instructions and generating human-like…

Cryptography and Security · Computer Science 2025-09-09 Andrew Yeo , Daeseon Choi

Prompt Inject Detection with Generative Explanation as an Investigative Tool

Large Language Models (LLMs) are vulnerable to adversarial prompt based injects. These injects could jailbreak or exploit vulnerabilities within these models with explicit prompt requests leading to undesired responses. In the context of…

Cryptography and Security · Computer Science 2025-02-18 Jonathan Pan , Swee Liang Wong , Yidi Yuan , Xin Wei Chia

Breaking to Build: A Threat Model of Prompt-Based Attacks for Securing LLMs

The proliferation of Large Language Models (LLMs) has introduced critical security challenges, where adversarial actors can manipulate input prompts to cause significant harm and circumvent safety alignments. These prompt-based attacks…

Computation and Language · Computer Science 2025-09-08 Brennen Hill , Surendra Parla , Venkata Abhijeeth Balabhadruni , Atharv Prajod Padmalayam , Sujay Chandra Shekara Sharma

Defending Against Indirect Prompt Injection Attacks With Spotlighting

Large Language Models (LLMs), while powerful, are built and trained to process a single text input. In common applications, multiple inputs can be processed by concatenating them together into a single stream of text. However, the LLM is…

Cryptography and Security · Computer Science 2024-03-25 Keegan Hines , Gary Lopez , Matthew Hall , Federico Zarfati , Yonatan Zunger , Emre Kiciman

Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition

Large Language Models (LLMs) are deployed in interactive contexts with direct user engagement, such as chatbots and writing assistants. These deployments are vulnerable to prompt injection and jailbreaking (collectively, prompt hacking), in…

Cryptography and Security · Computer Science 2024-03-05 Sander Schulhoff , Jeremy Pinto , Anaum Khan , Louis-François Bouchard , Chenglei Si , Svetlina Anati , Valen Tagliabue , Anson Liu Kost , Christopher Carnahan , Jordan Boyd-Graber

UniGuardian: A Unified Defense for Detecting Prompt Injection, Backdoor Attacks and Adversarial Attacks in Large Language Models

Large Language Models (LLMs) are vulnerable to attacks like prompt injection, backdoor attacks, and adversarial attacks, which manipulate prompts or models to generate harmful outputs. In this paper, departing from traditional deep learning…

Computation and Language · Computer Science 2025-02-19 Huawei Lin , Yingjie Lao , Tong Geng , Tan Yu , Weijie Zhao

Hijacking Large Language Models via Adversarial In-Context Learning

In-context learning (ICL) has emerged as a powerful paradigm leveraging LLMs for specific downstream tasks by utilizing labeled examples as demonstrations (demos) in the preconditioned prompts. Despite its promising performance, crafted…

Machine Learning · Computer Science 2025-05-30 Xiangyu Zhou , Yao Qiang , Saleh Zare Zade , Prashant Khanduri , Dongxiao Zhu

Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection

Large Language Models (LLMs) have demonstrated exceptional proficiency in instruction-following, becoming increasingly crucial across various applications. However, this capability brings with it the risk of prompt injection attacks, where…

Computation and Language · Computer Science 2023-11-28 Zekun Li , Baolin Peng , Pengcheng He , Xifeng Yan

SoK: Prompt Hacking of Large Language Models

The safety and robustness of large language models (LLMs) based applications remain critical challenges in artificial intelligence. Among the key threats to these applications are prompt hacking attacks, which can significantly undermine…

Cryptography and Security · Computer Science 2024-10-21 Baha Rababah , Shang , Wu , Matthew Kwiatkowski , Carson Leung , Cuneyt Gurcan Akcora