Related papers: Leveraging Programmatically Generated Synthetic Da…

Harnessing large-language models to generate private synthetic text

Differentially private training algorithms like DP-SGD protect sensitive training data by ensuring that trained models do not reveal private information. An alternative approach, which this paper studies, is to use a sensitive dataset to…

Machine Learning · Computer Science 2024-01-12 Alexey Kurakin , Natalia Ponomareva , Umar Syed , Liam MacDermed , Andreas Terzis

PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining

Differential Privacy (DP) image data synthesis, which leverages the DP technique to generate synthetic data to replace the sensitive data, allowing organizations to share and utilize synthetic images without privacy concerns. Previous…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Kecen Li , Chen Gong , Zhixiang Li , Yuzhong Zhao , Xinwen Hou , Tianhao Wang

Private Synthetic Data Meets Ensemble Learning

When machine learning models are trained on synthetic data and then deployed on real data, there is often a performance drop due to the distribution shift between synthetic and real data. In this paper, we introduce a new ensemble strategy…

Cryptography and Security · Computer Science 2023-10-17 Haoyuan Sun , Navid Azizan , Akash Srivastava , Hao Wang

From Easy to Hard: Building a Shortcut for Differentially Private Image Synthesis

Differentially private (DP) image synthesis aims to generate synthetic images from a sensitive dataset, alleviating the privacy leakage concerns of organizations sharing and utilizing synthetic images. Although previous methods have…

Cryptography and Security · Computer Science 2025-06-24 Kecen Li , Chen Gong , Xiaochen Li , Yuzhong Zhao , Xinwen Hou , Tianhao Wang

An Analysis of the Deployment of Models Trained on Private Tabular Synthetic Data: Unexpected Surprises

Diferentially private (DP) synthetic datasets are a powerful approach for training machine learning models while respecting the privacy of individual data providers. The effect of DP on the fairness of the resulting trained models is not…

Machine Learning · Statistics 2021-06-21 Mayana Pereira , Meghana Kshirsagar , Sumit Mukherjee , Rahul Dodhia , Juan Lavista Ferres

Assessment of Differentially Private Synthetic Data for Utility and Fairness in End-to-End Machine Learning Pipelines for Tabular Data

Differentially private (DP) synthetic data sets are a solution for sharing data while preserving the privacy of individual data providers. Understanding the effects of utilizing DP synthetic data in end-to-end machine learning pipelines…

Machine Learning · Computer Science 2023-10-31 Mayana Pereira , Meghana Kshirsagar , Sumit Mukherjee , Rahul Dodhia , Juan Lavista Ferres , Rafael de Sousa

Differentially Private Synthetic Mixed-Type Data Generation For Unsupervised Learning

We introduce the DP-auto-GAN framework for synthetic data generation, which combines the low dimensional representation of autoencoders with the flexibility of Generative Adversarial Networks (GANs). This framework can be used to take in…

Machine Learning · Computer Science 2020-12-11 Uthaipon Tantipongpipat , Chris Waites , Digvijay Boob , Amaresh Ankit Siva , Rachel Cummings

Synthetic Text Generation with Differential Privacy: A Simple and Practical Recipe

Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data. Generating synthetic versions of such data with a formal privacy guarantee,…

Computation and Language · Computer Science 2023-07-19 Xiang Yue , Huseyin A. Inan , Xuechen Li , Girish Kumar , Julia McAnallen , Hoda Shajari , Huan Sun , David Levitan , Robert Sim

DP-CGAN: Differentially Private Synthetic Data and Label Generation

Generative Adversarial Networks (GANs) are one of the well-known models to generate synthetic data including images, especially for research communities that cannot use original sensitive datasets because they are not publicly accessible.…

Machine Learning · Computer Science 2020-01-28 Reihaneh Torkzadehmahani , Peter Kairouz , Benedict Paten

Private Training & Data Generation by Clustering Embeddings

Deep neural networks often use large, high-quality datasets to achieve high performance on many machine learning tasks. When training involves potentially sensitive data, this process can raise privacy concerns, as large models have been…

Machine Learning · Computer Science 2025-06-23 Felix Zhou , Samson Zhou , Vahab Mirrokni , Alessandro Epasto , Vincent Cohen-Addad

Private Set Generation with Discriminative Information

Differentially private data generation techniques have become a promising solution to the data privacy challenge -- it enables sharing of data while complying with rigorous privacy guarantees, which is essential for scientific progress in…

Cryptography and Security · Computer Science 2022-11-09 Dingfan Chen , Raouf Kerkouche , Mario Fritz

Differentially Private Diffusion Models

While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains. Generative models trained with differential privacy (DP) on sensitive data can sidestep this challenge,…

Machine Learning · Statistics 2024-01-02 Tim Dockhorn , Tianshi Cao , Arash Vahdat , Karsten Kreis

Differentially Private Synthetic Data Generation via Lipschitz-Regularised Variational Autoencoders

Synthetic data has been hailed as the silver bullet for privacy preserving data analysis. If a record is not real, then how could it violate a person's privacy? In addition, deep-learning based generative models are employed successfully to…

Machine Learning · Computer Science 2023-07-14 Benedikt Groß , Gerhard Wunder

AugGen: Synthetic Augmentation using Diffusion Models Can Improve Recognition

The increasing reliance on large-scale datasets in machine learning poses significant privacy and ethical challenges, particularly in sensitive domains such as face recognition. Synthetic data generation offers a promising alternative;…

Computer Vision and Pattern Recognition · Computer Science 2025-10-27 Parsa Rahimi , Damien Teney , Sebastien Marcel

DPGen: Automated Program Synthesis for Differential Privacy

Differential privacy has become a de facto standard for releasing data in a privacy-preserving way. Creating a differentially private algorithm is a process that often starts with a noise-free (non-private) algorithm. The designer then…

Cryptography and Security · Computer Science 2021-09-16 Yuxin Wang , Zeyu Ding , Yingtai Xiao , Daniel Kifer , Danfeng Zhang

P3GM: Private High-Dimensional Data Release via Privacy Preserving Phased Generative Model

How can we release a massive volume of sensitive data while mitigating privacy risks? Privacy-preserving data synthesis enables the data holder to outsource analytical tasks to an untrusted third party. The state-of-the-art approach for…

Machine Learning · Computer Science 2022-03-08 Shun Takagi , Tsubasa Takahashi , Yang Cao , Masatoshi Yoshikawa

Differentially Private Synthetic Medical Data Generation using Convolutional GANs

Deep learning models have demonstrated superior performance in several application problems, such as image classification and speech processing. However, creating a deep learning model using health record data requires addressing certain…

Machine Learning · Computer Science 2021-12-14 Amirsina Torfi , Edward A. Fox , Chandan K. Reddy

Generating Privacy-Preserving Process Data with Deep Generative Models

Process data with confidential information cannot be shared directly in public, which hinders the research in process data mining and analytics. Data encryption methods have been studied to protect the data, but they still may be decrypted,…

Machine Learning · Computer Science 2022-03-16 Keyi Li , Sen Yang , Travis M. Sullivan , Randall S. Burd , Ivan Marsic

Differentially Private Tabular Data Synthesis using Large Language Models

Synthetic tabular data generation with differential privacy is a crucial problem to enable data sharing with formal privacy. Despite a rich history of methodological research and development, developing differentially private tabular data…

Machine Learning · Computer Science 2024-06-05 Toan V. Tran , Li Xiong

Generating tabular datasets under differential privacy

Machine Learning (ML) is accelerating progress across fields and industries, but relies on accessible and high-quality training data. Some of the most important datasets are found in biomedical and financial domains in the form of…

Machine Learning · Computer Science 2023-08-30 Gianluca Truda