Related papers: PrivCode: When Code Generation Meets Differential …

Towards Privacy-Preserving Code Generation: Differentially Private Code Language Models

Large language models specialized for code (CodeLLMs) have demonstrated remarkable capabilities in generating code snippets, documentation, and test cases. However, despite their promising capabilities, CodeLLMs can inadvertently memorize…

Software Engineering · Computer Science 2025-12-15 Melih Catal , Pooja Rani , Harald C. Gall

SafeSynthDP: Leveraging Large Language Models for Privacy-Preserving Synthetic Data Generation Using Differential Privacy

Machine learning (ML) models frequently rely on training data that may include sensitive or personal information, raising substantial privacy concerns. Legislative frameworks such as the General Data Protection Regulation (GDPR) and the…

Machine Learning · Computer Science 2024-12-31 Md Mahadi Hasan Nahid , Sadid Bin Hasan

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data. Generating synthetic versions of such data with a formal privacy guarantee,…

Computation and Language · Computer Science 2023-07-19 Xiang Yue , Huseyin A. Inan , Xuechen Li , Girish Kumar , Julia McAnallen , Hoda Shajari , Huan Sun , David Levitan , Robert Sim

Synthetic Query Generation for Privacy-Preserving Deep Retrieval Systems using Differentially Private Language Models

We address the challenge of ensuring differential privacy (DP) guarantees in training deep retrieval systems. Training these systems often involves the use of contrastive-style losses, which are typically non-per-example decomposable,…

Computation and Language · Computer Science 2024-05-24 Aldo Gael Carranza , Rezsa Farahani , Natalia Ponomareva , Alex Kurakin , Matthew Jagielski , Milad Nasr

DP-2Stage: Adapting Language Models as Differentially Private Tabular Data Generators

Generating tabular data under differential privacy (DP) protection ensures theoretical privacy guarantees but poses challenges for training machine learning models, primarily due to the need to capture complex structures under noisy…

Machine Learning · Computer Science 2025-04-30 Tejumade Afonja , Hui-Po Wang , Raouf Kerkouche , Mario Fritz

Privacy Preserving In-Context-Learning Framework for Large Language Models

Large language models (LLMs) have significantly transformed natural language understanding and generation, but they raise privacy concerns due to potential exposure of sensitive information. Studies have highlighted the risk of information…

Machine Learning · Computer Science 2025-11-20 Bishnu Bhusal , Manoj Acharya , Ramneet Kaur , Colin Samplawski , Anirban Roy , Adam D. Cobb , Rohit Chadha , Susmit Jha

Private prediction for large-scale synthetic text generation

We present an approach for generating differentially private synthetic text using large language models (LLMs), via private prediction. In the private prediction framework, we only require the output synthetic data to satisfy differential…

Machine Learning · Computer Science 2024-10-10 Kareem Amin , Alex Bie , Weiwei Kong , Alexey Kurakin , Natalia Ponomareva , Umar Syed , Andreas Terzis , Sergei Vassilvitskii

Differentially Private Tabular Data Synthesis using Large Language Models

Synthetic tabular data generation with differential privacy is a crucial problem to enable data sharing with formal privacy. Despite a rich history of methodological research and development, developing differentially private tabular data…

Machine Learning · Computer Science 2024-06-05 Toan V. Tran , Li Xiong

Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning

Large language models (LLMs) have emerged as powerful tools for tackling complex tasks across diverse domains, but they also raise privacy concerns when fine-tuned on sensitive data due to potential memorization. While differential privacy…

Computation and Language · Computer Science 2024-08-19 Lynn Chua , Badih Ghazi , Yangsibo Huang , Pritish Kamath , Ravi Kumar , Daogao Liu , Pasin Manurangsi , Amer Sinha , Chiyuan Zhang

PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining

Differential Privacy (DP) image data synthesis, which leverages the DP technique to generate synthetic data to replace the sensitive data, allowing organizations to share and utilize synthetic images without privacy concerns. Previous…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Kecen Li , Chen Gong , Zhixiang Li , Yuzhong Zhao , Xinwen Hou , Tianhao Wang

How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy

High quality data is needed to unlock the full potential of AI for end users. However finding new sources of such data is getting harder: most publicly-available human generated data will soon have been used. Additionally, publicly…

Cryptography and Security · Computer Science 2025-12-04 Natalia Ponomareva , Zheng Xu , H. Brendan McMahan , Peter Kairouz , Lucas Rosenblatt , Vincent Cohen-Addad , Cristóbal Guzmán , Ryan McKenna , Galen Andrew , Alex Bie , Da Yu , Alex Kurakin , Morteza Zadimoghaddam , Sergei Vassilvitskii , Andreas Terzis

Can Differentially Private Fine-tuning LLMs Protect Against Privacy Attacks?

Fine-tuning large language models (LLMs) has become an essential strategy for adapting them to specialized tasks; however, this process introduces significant privacy challenges, as sensitive training data may be inadvertently memorized and…

Cryptography and Security · Computer Science 2025-05-02 Hao Du , Shang Liu , Yang Cao

SynBench: A Benchmark for Differentially Private Text Generation

Synthetic text generation with Differential Privacy (DP) guarantees emerges as a principled approach that can enable the sharing of sensitive datasets across institutional and regulatory boundaries, while bounding the risks of…

Artificial Intelligence · Computer Science 2026-05-08 Yidan Sun , Viktor Schlegel , Srinivasan Nandakumar , Iqra Zahid , Yuping Wu , Yulong Wu , Hao Li , Jie Zhang , Warren Del-Pinto , Goran Nenadic , Siew Kei Lam , Anil Anthony Bharath

A Review of Privacy Metrics for Privacy-Preserving Synthetic Data Generation

Privacy Preserving Synthetic Data Generation (PP-SDG) has emerged to produce synthetic datasets from personal data while maintaining privacy and utility. Differential privacy (DP) is the property of a PP-SDG mechanism that establishes how…

Cryptography and Security · Computer Science 2025-07-23 Frederik Marinus Trudslev , Matteo Lissandrini , Juan Manuel Rodriguez , Martin Bøgsted , Daniele Dell'Aglio

Differentially Private Language Models for Secure Data Sharing

To protect the privacy of individuals whose data is being shared, it is of high importance to develop methods allowing researchers and companies to release textual data while providing formal privacy guarantees to its originators. In the…

Machine Learning · Computer Science 2022-10-27 Justus Mattern , Zhijing Jin , Benjamin Weggenmann , Bernhard Schoelkopf , Mrinmaya Sachan

DP-SelFT: Differentially Private Selective Fine-Tuning for Large Language Models

Large language models (LLMs) are commonly adapted to downstream tasks through fine-tuning, but fine-tuning data often contains sensitive information that may be leaked by the resulting model. Differential privacy (DP) offers formal…

Machine Learning · Computer Science 2026-05-19 Haichao Sha , Zihao Wang , Yuncheng Wu , Hong Chen , Wei Dong

DP-OPD: Differentially Private On-Policy Distillation for Language Models

Large language models (LLMs) are increasingly adapted to proprietary and domain-specific corpora that contain sensitive information, creating a tension between formal privacy guarantees and efficient deployment through model compression.…

Machine Learning · Computer Science 2026-04-07 Fatemeh Khadem , Sajad Mousavi , Yi Fang , Yuhong Liu

Aim High, Stay Private: Differentially Private Synthetic Data Enables Public Release of Behavioral Health Information with High Utility

Sharing health and behavioral data raises significant privacy concerns, as conventional de-identification methods are susceptible to privacy attacks. Differential Privacy (DP) provides formal guarantees against re-identification risks, but…

Cryptography and Security · Computer Science 2026-04-23 Mohsen Ghasemizade , Juniper Lovato , Christopher M. Danforth , Peter Sheridan Dodds , Laura S. P. Bloomfield , Matthew Price , Team LEMURS , Joseph P. Near

DPGen: Automated Program Synthesis for Differential Privacy

Differential privacy has become a de facto standard for releasing data in a privacy-preserving way. Creating a differentially private algorithm is a process that often starts with a noise-free (non-private) algorithm. The designer then…

Cryptography and Security · Computer Science 2021-09-16 Yuxin Wang , Zeyu Ding , Yingtai Xiao , Daniel Kifer , Danfeng Zhang

Differentially-Private Data Synthetisation for Efficient Re-Identification Risk Control

Protecting user data privacy can be achieved via many methods, from statistical transformations to generative models. However, all of them have critical drawbacks. For example, creating a transformed data set using traditional techniques is…

Machine Learning · Computer Science 2024-04-24 Tânia Carvalho , Nuno Moniz , Luís Antunes , Nitesh Chawla