Related papers: Extracting Training Data from Large Language Model…

Combing for Credentials: Active Pattern Extraction from Smart Reply

Pre-trained large language models, such as GPT\nobreakdash-2 and BERT, are often fine-tuned to achieve state-of-the-art performance on a downstream task. One natural example is the ``Smart Reply'' application where a pre-trained model is…

Cryptography and Security · Computer Science 2023-09-06 Bargav Jayaraman , Esha Ghosh , Melissa Chase , Sambuddha Roy , Wei Dai , David Evans

Bag of Tricks for Training Data Extraction from Language Models

With the advance of language models, privacy protection is receiving more attention. Training data extraction is therefore of great importance, as it can serve as a potential tool to assess privacy leakage. However, due to the difficulty of…

Computation and Language · Computer Science 2023-06-02 Weichen Yu , Tianyu Pang , Qian Liu , Chao Du , Bingyi Kang , Yan Huang , Min Lin , Shuicheng Yan

Traces of Memorisation in Large Language Models for Code

Large language models have gained significant popularity because of their ability to generate human-like text and potential applications in various fields, such as Software Engineering. Large language models for code are commonly trained on…

Cryptography and Security · Computer Science 2024-01-17 Ali Al-Kaswan , Maliheh Izadi , Arie van Deursen

Teach LLMs to Phish: Stealing Private Information from Language Models

When large language models are trained on private data, it can be a significant privacy risk for them to memorize and regurgitate sensitive information. In this work, we propose a new practical data extraction attack that we call "neural…

Cryptography and Security · Computer Science 2024-03-05 Ashwinee Panda , Christopher A. Choquette-Choo , Zhengming Zhang , Yaoqing Yang , Prateek Mittal

Scalable Extraction of Training Data from (Production) Language Models

This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of…

Machine Learning · Computer Science 2023-11-29 Milad Nasr , Nicholas Carlini , Jonathan Hayase , Matthew Jagielski , A. Feder Cooper , Daphne Ippolito , Christopher A. Choquette-Choo , Eric Wallace , Florian Tramèr , Katherine Lee

Towards More Realistic Extraction Attacks: An Adversarial Perspective

Language models are prone to memorizing their training data, making them vulnerable to extraction attacks. While existing research often examines isolated setups, such as a single model or a fixed prompt, real-world adversaries have a…

Cryptography and Security · Computer Science 2025-08-11 Yash More , Prakhar Ganesh , Golnoosh Farnadi

A Survey on Model Extraction Attacks and Defenses for Large Language Models

Model extraction attacks pose significant security threats to deployed language models, potentially compromising intellectual property and user privacy. This survey provides a comprehensive taxonomy of LLM-specific extraction attacks and…

Cryptography and Security · Computer Science 2025-07-09 Kaixiang Zhao , Lincan Li , Kaize Ding , Neil Zhenqiang Gong , Yue Zhao , Yushun Dong

Targeted Attack on GPT-Neo for the SATML Language Model Data Extraction Challenge

Previous work has shown that Large Language Models are susceptible to so-called data extraction attacks. This allows an attacker to extract a sample that was contained in the training data, which has massive privacy implications. The…

Computation and Language · Computer Science 2023-02-16 Ali Al-Kaswan , Maliheh Izadi , Arie van Deursen

Deduplicating Training Data Mitigates Privacy Risks in Language Models

Past work has shown that large language models are susceptible to privacy attacks, where adversaries generate sequences from a trained model and detect which sequences are memorized from the training set. In this work, we show that the…

Cryptography and Security · Computer Science 2022-12-21 Nikhil Kandpal , Eric Wallace , Colin Raffel

Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models

High-quality training data has proven crucial for developing performant large language models (LLMs). However, commercial LLM providers disclose few, if any, details about the data used for training. This lack of transparency creates…

Computation and Language · Computer Science 2025-03-18 Abhilasha Ravichander , Jillian Fisher , Taylor Sorensen , Ximing Lu , Yuchen Lin , Maria Antoniak , Niloofar Mireshghallah , Chandra Bhagavatula , Yejin Choi

Effective Prompt Extraction from Language Models

The text generated by large language models is commonly controlled by prompting, where a prompt prepended to a user's query guides the model's output. The prompts used by companies to guide their models are often treated as secrets, to be…

Computation and Language · Computer Science 2024-08-09 Yiming Zhang , Nicholas Carlini , Daphne Ippolito

Model Extraction Attacks on Graph Neural Networks: Taxonomy and Realization

Machine learning models are shown to face a severe threat from Model Extraction Attacks, where a well-trained private model owned by a service provider can be stolen by an attacker pretending as a client. Unfortunately, prior works focus on…

Machine Learning · Computer Science 2021-12-02 Bang Wu , Xiangwen Yang , Shirui Pan , Xingliang Yuan

Training Data Leakage Analysis in Language Models

Recent advances in neural network based language models lead to successful deployments of such models, improving user experience in various applications. It has been demonstrated that strong performance of language models comes along with…

Cryptography and Security · Computer Science 2021-02-24 Huseyin A. Inan , Osman Ramadan , Lukas Wutschitz , Daniel Jones , Victor Rühle , James Withers , Robert Sim

Memory Backdoor Attacks on Neural Networks

Neural networks are often trained on proprietary datasets, making them attractive attack targets. We present a novel dataset extraction method leveraging an innovative training time backdoor attack, allowing a malicious federated learning…

Cryptography and Security · Computer Science 2025-12-19 Eden Luzon , Guy Amit , Roy Weiss , Torsten Kraub , Alexandra Dmitrienko , Yisroel Mirsky

Generative Extraction of Audio Classifiers for Speaker Identification

It is perhaps no longer surprising that machine learning models, especially deep neural networks, are particularly vulnerable to attacks. One such vulnerability that has been well studied is model extraction: a phenomenon in which the…

Cryptography and Security · Computer Science 2022-07-27 Tejumade Afonja , Lucas Bourtoule , Varun Chandrasekaran , Sageev Oore , Nicolas Papernot

Training Data Extraction From Pre-trained Language Models: A Survey

As the deployment of pre-trained language models (PLMs) expands, pressing security concerns have arisen regarding the potential for malicious extraction of training data, posing a threat to data privacy. This study is the first to provide a…

Computation and Language · Computer Science 2023-05-26 Shotaro Ishihara

Controlling the Extraction of Memorized Data from Large Language Models via Prompt-Tuning

Large Language Models (LLMs) are known to memorize significant portions of their training data. Parts of this memorized content have been shown to be extractable by simply querying the model, which poses a privacy risk. We present a novel…

Computation and Language · Computer Science 2023-05-22 Mustafa Safa Ozdayi , Charith Peris , Jack FitzGerald , Christophe Dupuy , Jimit Majmudar , Haidar Khan , Rahil Parikh , Rahul Gupta

Submix: Practical Private Prediction for Large-Scale Language Models

Recent data-extraction attacks have exposed that language models can memorize some training samples verbatim. This is a vulnerability that can compromise the privacy of the model's training data. In this work, we introduce SubMix: a…

Machine Learning · Computer Science 2022-01-05 Antonio Ginart , Laurens van der Maaten , James Zou , Chuan Guo

Extracted BERT Model Leaks More Information than You Think!

The collection and availability of big data, combined with advances in pre-trained models (e.g. BERT), have revolutionized the predictive performance of natural language processing tasks. This allows corporations to provide machine learning…

Cryptography and Security · Computer Science 2022-11-01 Xuanli He , Chen Chen , Lingjuan Lyu , Qiongkai Xu

Extracting Training Dialogue Data from Large Language Model based Task Bots

Large Language Models (LLMs) have been widely adopted to enhance Task-Oriented Dialogue Systems (TODS) by modeling complex language patterns and delivering contextually appropriate responses. However, this integration introduces significant…

Computation and Language · Computer Science 2026-03-05 Shuo Zhang , Junzhou Zhao , Junji Hou , Pinghui Wang , Chenxu Wang , Jing Tao