Related papers: Combing for Credentials: Active Pattern Extraction…

Extracting Training Data from Large Language Models

It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover…

Cryptography and Security · Computer Science 2021-06-16 Nicholas Carlini , Florian Tramer , Eric Wallace , Matthew Jagielski , Ariel Herbert-Voss , Katherine Lee , Adam Roberts , Tom Brown , Dawn Song , Ulfar Erlingsson , Alina Oprea , Colin Raffel

Extracted BERT Model Leaks More Information than You Think!

The collection and availability of big data, combined with advances in pre-trained models (e.g. BERT), have revolutionized the predictive performance of natural language processing tasks. This allows corporations to provide machine learning…

Cryptography and Security · Computer Science 2022-11-01 Xuanli He , Chen Chen , Lingjuan Lyu , Qiongkai Xu

Killing One Bird with Two Stones: Model Extraction and Attribute Inference Attacks against BERT-based APIs

The collection and availability of big data, combined with advances in pre-trained models (e.g., BERT, XLNET, etc), have revolutionized the predictive performance of modern natural language processing tasks, ranging from text classification…

Cryptography and Security · Computer Science 2021-12-28 Chen Chen , Xuanli He , Lingjuan Lyu , Fangzhao Wu

Effective Prompt Extraction from Language Models

The text generated by large language models is commonly controlled by prompting, where a prompt prepended to a user's query guides the model's output. The prompts used by companies to guide their models are often treated as secrets, to be…

Computation and Language · Computer Science 2024-08-09 Yiming Zhang , Nicholas Carlini , Daphne Ippolito

Reconstruct Your Previous Conversations! Comprehensively Investigating Privacy Leakage Risks in Conversations with GPT Models

Significant advancements have recently been made in large language models represented by GPT models. Users frequently have multi-round private conversations with cloud-hosted GPT models for task optimization. Yet, this operational paradigm…

Cryptography and Security · Computer Science 2024-10-08 Junjie Chu , Zeyang Sha , Michael Backes , Yang Zhang

Towards More Realistic Extraction Attacks: An Adversarial Perspective

Language models are prone to memorizing their training data, making them vulnerable to extraction attacks. While existing research often examines isolated setups, such as a single model or a fixed prompt, real-world adversaries have a…

Cryptography and Security · Computer Science 2025-08-11 Yash More , Prakhar Ganesh , Golnoosh Farnadi

Does Prompt-Tuning Language Model Ensure Privacy?

Prompt-tuning has received attention as an efficient tuning method in the language domain, i.e., tuning a prompt that is a few tokens long, while keeping the large language model frozen, yet achieving comparable performance with…

Cryptography and Security · Computer Science 2023-04-18 Shangyu Xie , Wei Dai , Esha Ghosh , Sambuddha Roy , Dan Schwartz , Kim Laine

A Survey on Model Extraction Attacks and Defenses for Large Language Models

Model extraction attacks pose significant security threats to deployed language models, potentially compromising intellectual property and user privacy. This survey provides a comprehensive taxonomy of LLM-specific extraction attacks and…

Cryptography and Security · Computer Science 2025-07-09 Kaixiang Zhao , Lincan Li , Kaize Ding , Neil Zhenqiang Gong , Yue Zhao , Yushun Dong

Evaluation of Prompt Injection Defenses in Large Language Models

LLM-powered applications routinely embed secrets in system prompts, yet models can be tricked into revealing them. We built an adaptive attacker that evolves its strategies over hundreds of rounds and tested it against nine defense…

Cryptography and Security · Computer Science 2026-05-14 Priyal Deep , Shane Emmons , Amy Fox , Kyle Bacon , Kelley McAllister , Peter Ortiz , Krisztian Flautner

SoK: Reducing the Vulnerability of Fine-tuned Language Models to Membership Inference Attacks

Natural language processing models have experienced a significant upsurge in recent years, with numerous applications being built upon them. Many of these applications require fine-tuning generic base models on customized, proprietary…

Machine Learning · Computer Science 2024-03-14 Guy Amit , Abigail Goldsteen , Ariel Farkash

Analysis of Privacy Leakage in Federated Large Language Models

With the rapid adoption of Federated Learning (FL) as the training and tuning protocol for applications utilizing Large Language Models (LLMs), recent research highlights the need for significant modifications to FL to accommodate the…

Cryptography and Security · Computer Science 2024-03-11 Minh N. Vu , Truc Nguyen , Tre' R. Jeter , My T. Thai

Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning

Large Language Models (LLMs) are known to memorize significant portions of their training data. Parts of this memorized content have been shown to be extractable by simply querying the model, which poses a privacy risk. We present a novel…

Computation and Language · Computer Science 2023-05-22 Mustafa Safa Ozdayi , Charith Peris , Jack FitzGerald , Christophe Dupuy , Jimit Majmudar , Haidar Khan , Rahil Parikh , Rahul Gupta

Model Extraction and Adversarial Transferability, Your BERT is Vulnerable!

Natural language processing (NLP) tasks, ranging from text classification to text generation, have been revolutionised by the pre-trained language models, such as BERT. This allows corporations to easily build powerful APIs by encapsulating…

Computation and Language · Computer Science 2021-03-19 Xuanli He , Lingjuan Lyu , Qiongkai Xu , Lichao Sun

Bag of Tricks for Training Data Extraction from Language Models

With the advance of language models, privacy protection is receiving more attention. Training data extraction is therefore of great importance, as it can serve as a potential tool to assess privacy leakage. However, due to the difficulty of…

Computation and Language · Computer Science 2023-06-02 Weichen Yu , Tianyu Pang , Qian Liu , Chao Du , Bingyi Kang , Yan Huang , Min Lin , Shuicheng Yan

Why Are My Prompts Leaked? Unraveling Prompt Extraction Threats in Customized Large Language Models

The drastic increase of large language models' (LLMs) parameters has led to a new research direction of fine-tuning-free downstream customization by prompts, i.e., task descriptions. While these prompt-based services (e.g. OpenAI's GPTs)…

Computation and Language · Computer Science 2025-02-13 Zi Liang , Haibo Hu , Qingqing Ye , Yaxin Xiao , Haoyang Li

Quantifying Privacy Risks of Prompts in Visual Prompt Learning

Large-scale pre-trained models are increasingly adapted to downstream tasks through a new paradigm called prompt learning. In contrast to fine-tuning, prompt learning does not update the pre-trained model's parameters. Instead, it only…

Cryptography and Security · Computer Science 2023-10-19 Yixin Wu , Rui Wen , Michael Backes , Pascal Berrang , Mathias Humbert , Yun Shen , Yang Zhang

Thief, Beware of What Get You There: Towards Understanding Model Extraction Attack

Model extraction increasingly attracts research attentions as keeping commercial AI models private can retain a competitive advantage. In some scenarios, AI models are trained proprietarily, where neither pre-trained models nor sufficient…

Machine Learning · Computer Science 2021-04-14 Xinyi Zhang , Chengfang Fang , Jie Shi

Unlearned but Not Forgotten: Data Extraction after Exact Unlearning in LLM

Large Language Models are typically trained on datasets collected from the web, which may inadvertently contain harmful or sensitive personal information. To address growing privacy concerns, unlearning methods have been proposed to remove…

Machine Learning · Computer Science 2025-10-23 Xiaoyu Wu , Yifei Pang , Terrance Liu , Zhiwei Steven Wu

You Are What You Write: Preserving Privacy in the Era of Large Language Models

Large scale adoption of large language models has introduced a new era of convenient knowledge transfer for a slew of natural language processing tasks. However, these models also run the risk of undermining user trust by exposing unwanted…

Computation and Language · Computer Science 2022-04-21 Richard Plant , Valerio Giuffrida , Dimitra Gkatzia

IDT: Dual-Task Adversarial Attacks for Privacy Protection

Natural language processing (NLP) models may leak private information in different ways, including membership inference, reconstruction or attribute inference attacks. Sensitive information may not be explicit in the text, but hidden in…

Computation and Language · Computer Science 2024-07-01 Pedro Faustini , Shakila Mahjabin Tonni , Annabelle McIver , Qiongkai Xu , Mark Dras