English
Related papers

Related papers: Adversarial Tokenization

200 papers

Although pre-trained language models (PrLMs) have achieved significant success, recent studies demonstrate that PrLMs are vulnerable to adversarial attacks. By generating adversarial examples with slight perturbations on different levels…

Computation and Language · Computer Science 2022-08-23 Jiayi Wang , Rongzhou Bao , Zhuosheng Zhang , Hai Zhao

Large Language Models (LLMs), including GPT-3.5, LLaMA, and PaLM, seem to be knowledgeable and able to adapt to many tasks. However, we still cannot completely trust their answers, since LLMs suffer from \textbf{hallucination}\textemdash…

Computation and Language · Computer Science 2024-08-06 Jia-Yu Yao , Kun-Peng Ning , Zhen-Hui Liu , Mu-Nan Ning , Yu-Yang Liu , Li Yuan

As powerful Large Language Models (LLMs) are now widely used for numerous practical applications, their safety is of critical importance. While alignment techniques have significantly improved overall safety, LLMs remain vulnerable to…

Machine Learning · Computer Science 2024-10-28 Samuel Jacob Chacko , Sajib Biswas , Chashi Mahiul Islam , Fatema Tabassum Liza , Xiuwen Liu

To prevent Text-to-Image (T2I) models from generating unethical images, people deploy safety filters to block inappropriate drawing prompts. Previous works have employed token replacement to search adversarial prompts that attempt to bypass…

Artificial Intelligence · Computer Science 2024-11-27 Yimo Deng , Huangxun Chen

Many adversarial attacks target natural language processing systems, most of which succeed through modifying the individual tokens of a document. Despite the apparent uniqueness of each of these attacks, fundamentally they are simply a…

Computation and Language · Computer Science 2024-01-09 Tom Roth , Yansong Gao , Alsharif Abuadbba , Surya Nepal , Wei Liu

Although safely enhanced Large Language Models (LLMs) have achieved remarkable success in tackling various complex tasks in a zero-shot manner, they remain susceptible to jailbreak attacks, particularly the unknown jailbreak attack. To…

Computation and Language · Computer Science 2024-06-12 Fan Liu , Zhao Xu , Hao Liu

Large language models (LLMs) are vulnerable to adversarial attacks that add malicious tokens to an input prompt to bypass the safety guardrails of an LLM and cause it to produce harmful content. In this work, we introduce erase-and-check,…

Computation and Language · Computer Science 2025-02-06 Aounon Kumar , Chirag Agarwal , Suraj Srinivas , Aaron Jiaxun Li , Soheil Feizi , Himabindu Lakkaraju

We study cross-lingual sequence tagging with little or no labeled data in the target language. Adversarial training has previously been shown to be effective for training cross-lingual sentence classifiers. However, it is not clear if…

Computation and Language · Computer Science 2018-08-15 Heike Adel , Anton Bryl , David Weiss , Aliaksei Severyn

As Large Language Models quickly become ubiquitous, it becomes critical to understand their security vulnerabilities. Recent work shows that text optimizers can produce jailbreaking prompts that bypass moderation and alignment. Drawing from…

Large language models (LLMs) are known to be vulnerable to jailbreak attacks, which typically rely on carefully designed prompts containing explicit semantic structure. These attacks generally operate by fixing an adversarial instruction…

Machine Learning · Computer Science 2026-05-07 Marco Rando , Samuel Vaiter

The growth of highly advanced Large Language Models (LLMs) constitutes a huge dual-use problem, making it necessary to create dependable AI-generated text detection systems. Modern detectors are notoriously vulnerable to adversarial…

Cryptography and Security · Computer Science 2025-10-06 Lekkala Sai Teja , Annepaka Yadagiri , Sangam Sai Anish , Siva Gopala Krishna Nuthakki , Partha Pakray

Despite significant ongoing efforts in safety alignment, large language models (LLMs) such as GPT-4 and LLaMA 3 remain vulnerable to jailbreak attacks that can induce harmful behaviors, including through the use of adversarial suffixes.…

Cryptography and Security · Computer Science 2024-12-20 Wei Zhao , Zhe Li , Yige Li , Jun Sun

Large Language Models (LLMs) are typically shipped with tokenizers that deterministically encode text into so-called canonical token sequences, to which the LLMs assign probability values. One common assumption is that the probability of a…

Computation and Language · Computer Science 2025-06-09 Renato Lui Geh , Honghua Zhang , Kareem Ahmed , Benjie Wang , Guy Van den Broeck

Large Language Models (LLMs) have shown remarkable capabilities in language understanding and generation. Nonetheless, it was also witnessed that LLMs tend to produce inaccurate responses to specific queries. This deficiency can be traced…

Computation and Language · Computer Science 2025-05-16 Dixuan Wang , Yanda Li , Junyuan Jiang , Zepeng Ding , Ziqin Luo , Guochao Jiang , Jiaqing Liang , Deqing Yang

Benchmarking outcomes increasingly govern trust, selection, and deployment of LLMs, yet these evaluations remain vulnerable to semantically equivalent adversarial perturbations. Prior work on adversarial robustness in NLP has emphasized…

Machine Learning · Computer Science 2025-10-16 Ivan Dubrovsky , Anastasia Orlova , Illarion Iov , Nina Gubina , Irena Gureeva , Alexey Zaytsev

The widespread adoption of code language models in software engineering tasks has exposed vulnerabilities to adversarial attacks, especially the identifier substitution attacks. Although existing identifier substitution attackers…

Software Engineering · Computer Science 2025-04-29 Wenhan Mu , Ling Xu , Shuren Pei , Le Mi , Huichi Zhou

Large language models (LLMs) have become the backbone of modern natural language processing but pose privacy concerns about leaking sensitive training data. Membership inference attacks (MIAs), which aim to infer whether a sample is…

Machine Learning · Computer Science 2025-06-03 Toan Tran , Ruixuan Liu , Li Xiong

Large Language Models (LLMs) have seen widespread adoption across multiple domains, creating an urgent need for robust safety alignment mechanisms. However, robustness remains challenging due to jailbreak attacks that bypass alignment via…

Machine Learning · Computer Science 2026-05-04 Hicham Eddoubi , Umar Faruk Abdullahi , Fadi Hassan

In spite of the successful application in many fields, machine learning models today suffer from notorious problems like vulnerability to adversarial examples. Beyond falling into the cat-and-mouse game between adversarial attack and…

Artificial Intelligence · Computer Science 2022-07-06 Jitao Sang , Xian Zhao , Jiaming Zhang , Zhiyu Lin

Adversarial attacks are a major challenge faced by current machine learning research. These purposely crafted inputs fool even the most advanced models, precluding their deployment in safety-critical applications. Extensive research in…

Artificial Intelligence · Computer Science 2023-06-30 Edoardo Mosca , Shreyash Agarwal , Javier Rando , Georg Groh
‹ Prev 1 2 3 10 Next ›