Related papers: CodeChameleon: Personalized Encryption Framework f…

QueryAttack: Jailbreaking Aligned Large Language Models Using Structured Non-natural Query Language

Recent advances in large language models (LLMs) have demonstrated remarkable potential in the field of natural language processing. Unfortunately, LLMs face significant security and ethical risks. Although techniques such as safety…

Cryptography and Security · Computer Science 2025-05-27 Qingsong Zou , Jingyu Xiao , Qing Li , Zhi Yan , Yuhang Wang , Li Xu , Wenxuan Wang , Kuofeng Gao , Ruoyu Li , Yong Jiang

Defending Large Language Models Against Jailbreak Attacks via In-Decoding Safety-Awareness Probing

Large language models (LLMs) have achieved impressive performance across natural language tasks and are increasingly deployed in real-world applications. Despite extensive safety alignment efforts, recent studies show that such alignment is…

Artificial Intelligence · Computer Science 2026-02-02 Yinzhi Zhao , Ming Wang , Shi Feng , Xiaocui Yang , Daling Wang , Yifei Zhang

CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal Representations

Security alignment enables the Large Language Model (LLM) to gain the protection against malicious queries, but various jailbreak attack methods reveal the vulnerability of this security mechanism. Previous studies have isolated LLM…

Cryptography and Security · Computer Science 2025-08-07 Xiaohu Li , Yunfeng Ning , Zepeng Bao , Mayi Xu , Jianhao Chen , Tieyun Qian

Behind the Mask: Benchmarking Camouflaged Jailbreaks in Large Language Models

Large Language Models (LLMs) are increasingly vulnerable to a sophisticated form of adversarial prompting known as camouflaged jailbreaking. This method embeds malicious intent within seemingly benign language to evade existing safety…

Cryptography and Security · Computer Science 2025-09-09 Youjia Zheng , Mohammad Zandsalimy , Shanu Sushmita

Revisiting Jailbreaking for Large Language Models: A Representation Engineering Perspective

The recent surge in jailbreaking attacks has revealed significant vulnerabilities in Large Language Models (LLMs) when exposed to malicious inputs. While various defense strategies have been proposed to mitigate these threats, there has…

Computation and Language · Computer Science 2025-02-24 Tianlong Li , Zhenghua Wang , Wenhao Liu , Muling Wu , Shihan Dou , Changze Lv , Xiaohua Wang , Xiaoqing Zheng , Xuanjing Huang

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

Jailbreak attacks are crucial for identifying and mitigating the security vulnerabilities of Large Language Models (LLMs). They are designed to bypass safeguards and elicit prohibited outputs. However, due to significant differences among…

Computation and Language · Computer Science 2024-03-20 Weikang Zhou , Xiao Wang , Limao Xiong , Han Xia , Yingshuang Gu , Mingxu Chai , Fukang Zhu , Caishuang Huang , Shihan Dou , Zhiheng Xi , Rui Zheng , Songyang Gao , Yicheng Zou , Hang Yan , Yifan Le , Ruohui Wang , Lijun Li , Jing Shao , Tao Gui , Qi Zhang , Xuanjing Huang

Diversity Helps Jailbreak Large Language Models

We have uncovered a powerful jailbreak technique that leverages large language models' ability to diverge from prior context, enabling them to bypass safety constraints and generate harmful outputs. By simply instructing the LLM to deviate…

Computation and Language · Computer Science 2025-05-13 Weiliang Zhao , Daniel Ben-Levi , Wei Hao , Junfeng Yang , Chengzhi Mao

Self-Deception: Reverse Penetrating the Semantic Firewall of Large Language Models

Large language models (LLMs), such as ChatGPT, have emerged with astonishing capabilities approaching artificial general intelligence. While providing convenience for various societal needs, LLMs have also lowered the cost of generating…

Computation and Language · Computer Science 2023-08-28 Zhenhua Wang , Wei Xie , Kai Chen , Baosheng Wang , Zhiwen Gui , Enze Wang

SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

Integrated Speech and Large Language Models (SLMs) that can follow speech instructions and generate relevant text responses have gained popularity lately. However, the safety and robustness of these models remains largely unclear. In this…

Computation and Language · Computer Science 2024-05-15 Raghuveer Peri , Sai Muralidhar Jayanthi , Srikanth Ronanki , Anshu Bhatia , Karel Mundnich , Saket Dingliwal , Nilaksh Das , Zejiang Hou , Goeric Huybrechts , Srikanth Vishnubhotla , Daniel Garcia-Romero , Sundararajan Srinivasan , Kyu J Han , Katrin Kirchhoff

SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding

As large language models (LLMs) become increasingly integrated into real-world applications such as code generation and chatbot assistance, extensive efforts have been made to align LLM behavior with human values, including safety.…

Cryptography and Security · Computer Science 2024-07-29 Zhangchen Xu , Fengqing Jiang , Luyao Niu , Jinyuan Jia , Bill Yuchen Lin , Radha Poovendran

The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models

Large language models (LLMs) have demonstrated remarkable capabilities, but their power comes with significant security considerations. While extensive research has been conducted on the safety of LLMs in chat mode, the security…

Cryptography and Security · Computer Science 2024-12-25 Zihui Wu , Haichang Gao , Jianping He , Ping Wang

SafeBehavior: Simulating Human-Like Multistage Reasoning to Mitigate Jailbreak Attacks in Large Language Models

Large Language Models (LLMs) have achieved impressive performance across diverse natural language processing tasks, but their growing power also amplifies potential risks such as jailbreak attacks that circumvent built-in safety mechanisms.…

Artificial Intelligence · Computer Science 2025-10-01 Qinjian Zhao , Jiaqi Wang , Zhiqiang Gao , Zhihao Dou , Belal Abuhaija , Kaizhu Huang

Smoke and Mirrors: Jailbreaking LLM-based Code Generation via Implicit Malicious Prompts

The proliferation of Large Language Models (LLMs) has revolutionized natural language processing and significantly impacted code generation tasks, enhancing software development efficiency and productivity. Notably, LLMs like GPT-4 have…

Software Engineering · Computer Science 2025-03-25 Sheng Ouyang , Yihao Qin , Bo Lin , Liqian Chen , Xiaoguang Mao , Shangwen Wang

NeuroBreak: Unveil Internal Jailbreak Mechanisms in Large Language Models

In deployment and application, large language models (LLMs) typically undergo safety alignment to prevent illegal and unethical outputs. However, the continuous advancement of jailbreak attack techniques, designed to bypass safety…

Cryptography and Security · Computer Science 2025-09-05 Chuhan Zhang , Ye Zhang , Bowen Shi , Yuyou Gan , Tianyu Du , Shouling Ji , Dazhan Deng , Yingcai Wu

MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots

Large Language Models (LLMs) have revolutionized Artificial Intelligence (AI) services due to their exceptional proficiency in understanding and generating human-like text. LLM chatbots, in particular, have seen widespread adoption,…

Cryptography and Security · Computer Science 2024-02-14 Gelei Deng , Yi Liu , Yuekang Li , Kailong Wang , Ying Zhang , Zefeng Li , Haoyu Wang , Tianwei Zhang , Yang Liu

When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers

Recent advancements in Large Language Model (LLM) safety have primarily focused on mitigating attacks crafted in natural language or common ciphers (e.g. Base64), which are likely integrated into newer models' safety training. However, we…

Computation and Language · Computer Science 2025-10-15 Divij Handa , Zehua Zhang , Amir Saeidi , Shrinidhi Kumbhar , Md Nayem Uddin , Aswin RRV , Chitta Baral

SeqAR: Jailbreak LLMs with Sequential Auto-Generated Characters

The widespread applications of large language models (LLMs) have brought about concerns regarding their potential misuse. Although aligned with human preference data before release, LLMs remain vulnerable to various malicious attacks. In…

Cryptography and Security · Computer Science 2025-03-04 Yan Yang , Zeguan Xiao , Xin Lu , Hongru Wang , Xuetao Wei , Hailiang Huang , Guanhua Chen , Yun Chen

Chain-of-Lure: A Universal Jailbreak Attack Framework using Unconstrained Synthetic Narratives

In the era of rapid generative AI development, interactions with large language models (LLMs) pose increasing risks of misuse. Prior research has primarily focused on attacks using template-based prompts and optimization-oriented methods,…

Cryptography and Security · Computer Science 2026-03-03 Wenhan Chang , Tianqing Zhu , Yu Zhao , Shuangyong Song , Ping Xiong , Wanlei Zhou

Can Large Language Models Automatically Jailbreak GPT-4V?

GPT-4V has attracted considerable attention due to its extraordinary capacity for integrating and processing multimodal information. At the same time, its ability of face recognition raises new safety concerns of privacy leakage. Despite…

Computation and Language · Computer Science 2024-08-26 Yuanwei Wu , Yue Huang , Yixin Liu , Xiang Li , Pan Zhou , Lichao Sun

An LLM-Assisted Easy-to-Trigger Backdoor Attack on Code Completion Models: Injecting Disguised Vulnerabilities against Strong Detection

Large Language Models (LLMs) have transformed code completion tasks, providing context-based suggestions to boost developer productivity in software engineering. As users often fine-tune these models for specific applications, poisoning and…

Cryptography and Security · Computer Science 2024-06-12 Shenao Yan , Shen Wang , Yue Duan , Hanbin Hong , Kiho Lee , Doowon Kim , Yuan Hong