Related papers: Differentially Private Data Generation with Missin…

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data. Generating synthetic versions of such data with a formal privacy guarantee,…

Computation and Language · Computer Science 2023-07-19 Xiang Yue , Huseyin A. Inan , Xuechen Li , Girish Kumar , Julia McAnallen , Hoda Shajari , Huan Sun , David Levitan , Robert Sim

How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy

High quality data is needed to unlock the full potential of AI for end users. However finding new sources of such data is getting harder: most publicly-available human generated data will soon have been used. Additionally, publicly…

Cryptography and Security · Computer Science 2025-12-04 Natalia Ponomareva , Zheng Xu , H. Brendan McMahan , Peter Kairouz , Lucas Rosenblatt , Vincent Cohen-Addad , Cristóbal Guzmán , Ryan McKenna , Galen Andrew , Alex Bie , Da Yu , Alex Kurakin , Morteza Zadimoghaddam , Sergei Vassilvitskii , Andreas Terzis

Are Data Experts Buying into Differentially Private Synthetic Data? Gathering Community Perspectives

Data privacy is a core tenet of responsible computing, and in the United States, differential privacy (DP) is the dominant technical operationalization of privacy-preserving data analysis. With this study, we qualitatively examine one class…

Human-Computer Interaction · Computer Science 2024-12-18 Lucas Rosenblatt , Bill Howe , Julia Stoyanovich

Generating Synthetic Data with Formal Privacy Guarantees: State of the Art and the Road Ahead

Privacy-preserving synthetic data offers a promising solution to harness segregated data in high-stakes domains where information is compartmentalized for regulatory, privacy, or institutional reasons. This survey provides a comprehensive…

Cryptography and Security · Computer Science 2025-03-28 Viktor Schlegel , Anil A Bharath , Zilong Zhao , Kevin Yee

Differentially Private Synthetic Data Generation for Relational Databases

Existing differentially private (DP) synthetic data generation mechanisms typically assume a single-source table. In practice, data is often distributed across multiple tables with relationships across tables. In this paper, we introduce…

Machine Learning · Computer Science 2025-01-22 Kaveh Alimohammadi , Hao Wang , Ojas Gulati , Akash Srivastava , Navid Azizan

Synthesizing Grid Data with Cyber Resilience and Privacy Guarantees

Differential privacy (DP) provides a principled approach to synthesizing data (e.g., loads) from real-world power systems while limiting the exposure of sensitive information. However, adversaries may exploit synthetic data to calibrate…

Systems and Control · Electrical Eng. & Systems 2025-05-05 Shengyang Wu , Vladimir Dvorkin

Does Differentially Private Synthetic Data Lead to Synthetic Discoveries?

Background: Synthetic data has been proposed as a solution for sharing anonymized versions of sensitive biomedical datasets. Ideally, synthetic data should preserve the structure and statistical properties of the original data, while…

Machine Learning · Computer Science 2024-10-24 Ileana Montoya Perez , Parisa Movahedi , Valtteri Nieminen , Antti Airola , Tapio Pahikkala

Evaluating Differentially Private Generation of Domain-Specific Text

Generative AI offers transformative potential for high-stakes domains such as healthcare and finance, yet privacy and regulatory barriers hinder the use of real-world data. To address this, differentially private synthetic data generation…

Machine Learning · Computer Science 2025-09-01 Yidan Sun , Viktor Schlegel , Srinivasan Nandakumar , Iqra Zahid , Yuping Wu , Warren Del-Pinto , Goran Nenadic , Siew-Kei Lam , Jie Zhang , Anil A Bharath

Towards Biologically Plausible and Private Gene Expression Data Generation

Generative models trained with Differential Privacy (DP) are becoming increasingly prominent in the creation of synthetic data for downstream applications. Existing literature, however, primarily focuses on basic benchmarking datasets and…

Cryptography and Security · Computer Science 2024-02-08 Dingfan Chen , Marie Oestreich , Tejumade Afonja , Raouf Kerkouche , Matthias Becker , Mario Fritz

Synthetic Query Generation for Privacy-Preserving Deep Retrieval Systems using Differentially Private Language Models

We address the challenge of ensuring differential privacy (DP) guarantees in training deep retrieval systems. Training these systems often involves the use of contrastive-style losses, which are typically non-per-example decomposable,…

Computation and Language · Computer Science 2024-05-24 Aldo Gael Carranza , Rezsa Farahani , Natalia Ponomareva , Alex Kurakin , Matthew Jagielski , Milad Nasr

pMSE Mechanism: Differentially Private Synthetic Data with Maximal Distributional Similarity

We propose a method for the release of differentially private synthetic datasets. In many contexts, data contain sensitive values which cannot be released in their original form in order to protect individuals' privacy. Synthetic data is a…

Methodology · Statistics 2018-05-25 Joshua Snoke , Aleksandra Slavković

Spending Privacy Budget Fairly and Wisely

Differentially private (DP) synthetic data generation is a practical method for improving access to data as a means to encourage productive partnerships. One issue inherent to DP is that the "privacy budget" is generally "spent" evenly…

Machine Learning · Computer Science 2022-08-11 Lucas Rosenblatt , Joshua Allen , Julia Stoyanovich

A Review of Privacy Metrics for Privacy-Preserving Synthetic Data Generation

Privacy Preserving Synthetic Data Generation (PP-SDG) has emerged to produce synthetic datasets from personal data while maintaining privacy and utility. Differential privacy (DP) is the property of a PP-SDG mechanism that establishes how…

Cryptography and Security · Computer Science 2025-07-23 Frederik Marinus Trudslev , Matteo Lissandrini , Juan Manuel Rodriguez , Martin Bøgsted , Daniele Dell'Aglio

Privacy-Preserving Synthetic Datasets Over Weakly Constrained Domains

Techniques to deliver privacy-preserving synthetic datasets take a sensitive dataset as input and produce a similar dataset as output while maintaining differential privacy. These approaches have the potential to improve data sharing and…

Databases · Computer Science 2018-08-24 Luke Rodriguez , Bill Howe

An Analysis of the Deployment of Models Trained on Private Tabular Synthetic Data: Unexpected Surprises

Diferentially private (DP) synthetic datasets are a powerful approach for training machine learning models while respecting the privacy of individual data providers. The effect of DP on the fairness of the resulting trained models is not…

Machine Learning · Statistics 2021-06-21 Mayana Pereira , Meghana Kshirsagar , Sumit Mukherjee , Rahul Dodhia , Juan Lavista Ferres

SMOTE-DP: Improving Privacy-Utility Tradeoff with Synthetic Data

Privacy-preserving data publication, including synthetic data sharing, often experiences trade-offs between privacy and utility. Synthetic data is generally more effective than data anonymization in balancing this trade-off, however, not…

Machine Learning · Computer Science 2025-06-03 Yan Zhou , Bradley Malin , Murat Kantarcioglu

Inference With Combining Rules From Multiple Differentially Private Synthetic Datasets

Differential privacy (DP) has been accepted as a rigorous criterion for measuring the privacy protection offered by random mechanisms used to obtain statistics or, as we will study here, synthetic datasets from confidential data. Methods to…

Methodology · Statistics 2024-05-09 Leila Nombo , Anne-Sophie Charest

Evaluating Differentially Private Synthetic Data Generation in High-Stakes Domains

The difficulty of anonymizing text data hinders the development and deployment of NLP in high-stakes domains that involve private data, such as healthcare and social services. Poorly anonymized sensitive data cannot be easily shared with…

Computation and Language · Computer Science 2024-10-14 Krithika Ramesh , Nupoor Gandhi , Pulkit Madaan , Lisa Bauer , Charith Peris , Anjalie Field

Differentially Private Synthetic Data with Private Density Estimation

The need to analyze sensitive data, such as medical records or financial data, has created a critical research challenge in recent years. In this paper, we adopt the framework of differential privacy, and explore mechanisms for generating…

Cryptography and Security · Computer Science 2024-05-09 Nikolija Bojkovic , Po-Ling Loh

Differentially Private Non Parametric Copulas: Generating synthetic data with non parametric copulas under privacy guarantees

Creation of synthetic data models has represented a significant advancement across diverse scientific fields, but this technology also brings important privacy considerations for users. This work focuses on enhancing a non-parametric…

Machine Learning · Computer Science 2025-07-15 Pablo A. Osorio-Marulanda , John Esteban Castro Ramirez , Mikel Hernández Jiménez , Nicolas Moreno Reyes , Gorka Epelde Unanue