English
Related papers

Related papers: Leveraging Programmatically Generated Synthetic Da…

200 papers

Differentially private training algorithms like DP-SGD protect sensitive training data by ensuring that trained models do not reveal private information. An alternative approach, which this paper studies, is to use a sensitive dataset to…

Machine Learning · Computer Science 2024-01-12 Alexey Kurakin , Natalia Ponomareva , Umar Syed , Liam MacDermed , Andreas Terzis

Differential Privacy (DP) image data synthesis, which leverages the DP technique to generate synthetic data to replace the sensitive data, allowing organizations to share and utilize synthetic images without privacy concerns. Previous…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Kecen Li , Chen Gong , Zhixiang Li , Yuzhong Zhao , Xinwen Hou , Tianhao Wang

When machine learning models are trained on synthetic data and then deployed on real data, there is often a performance drop due to the distribution shift between synthetic and real data. In this paper, we introduce a new ensemble strategy…

Cryptography and Security · Computer Science 2023-10-17 Haoyuan Sun , Navid Azizan , Akash Srivastava , Hao Wang

Differentially private (DP) image synthesis aims to generate synthetic images from a sensitive dataset, alleviating the privacy leakage concerns of organizations sharing and utilizing synthetic images. Although previous methods have…

Cryptography and Security · Computer Science 2025-06-24 Kecen Li , Chen Gong , Xiaochen Li , Yuzhong Zhao , Xinwen Hou , Tianhao Wang

Diferentially private (DP) synthetic datasets are a powerful approach for training machine learning models while respecting the privacy of individual data providers. The effect of DP on the fairness of the resulting trained models is not…

Machine Learning · Statistics 2021-06-21 Mayana Pereira , Meghana Kshirsagar , Sumit Mukherjee , Rahul Dodhia , Juan Lavista Ferres

Differentially private (DP) synthetic data sets are a solution for sharing data while preserving the privacy of individual data providers. Understanding the effects of utilizing DP synthetic data in end-to-end machine learning pipelines…

Machine Learning · Computer Science 2023-10-31 Mayana Pereira , Meghana Kshirsagar , Sumit Mukherjee , Rahul Dodhia , Juan Lavista Ferres , Rafael de Sousa

We introduce the DP-auto-GAN framework for synthetic data generation, which combines the low dimensional representation of autoencoders with the flexibility of Generative Adversarial Networks (GANs). This framework can be used to take in…

Machine Learning · Computer Science 2020-12-11 Uthaipon Tantipongpipat , Chris Waites , Digvijay Boob , Amaresh Ankit Siva , Rachel Cummings

Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data. Generating synthetic versions of such data with a formal privacy guarantee,…

Computation and Language · Computer Science 2023-07-19 Xiang Yue , Huseyin A. Inan , Xuechen Li , Girish Kumar , Julia McAnallen , Hoda Shajari , Huan Sun , David Levitan , Robert Sim

Generative Adversarial Networks (GANs) are one of the well-known models to generate synthetic data including images, especially for research communities that cannot use original sensitive datasets because they are not publicly accessible.…

Machine Learning · Computer Science 2020-01-28 Reihaneh Torkzadehmahani , Peter Kairouz , Benedict Paten

Deep neural networks often use large, high-quality datasets to achieve high performance on many machine learning tasks. When training involves potentially sensitive data, this process can raise privacy concerns, as large models have been…

Machine Learning · Computer Science 2025-06-23 Felix Zhou , Samson Zhou , Vahab Mirrokni , Alessandro Epasto , Vincent Cohen-Addad

Differentially private data generation techniques have become a promising solution to the data privacy challenge -- it enables sharing of data while complying with rigorous privacy guarantees, which is essential for scientific progress in…

Cryptography and Security · Computer Science 2022-11-09 Dingfan Chen , Raouf Kerkouche , Mario Fritz

While modern machine learning models rely on increasingly large training datasets, data is often limited in privacy-sensitive domains. Generative models trained with differential privacy (DP) on sensitive data can sidestep this challenge,…

Machine Learning · Statistics 2024-01-02 Tim Dockhorn , Tianshi Cao , Arash Vahdat , Karsten Kreis

Synthetic data has been hailed as the silver bullet for privacy preserving data analysis. If a record is not real, then how could it violate a person's privacy? In addition, deep-learning based generative models are employed successfully to…

Machine Learning · Computer Science 2023-07-14 Benedikt Groß , Gerhard Wunder

The increasing reliance on large-scale datasets in machine learning poses significant privacy and ethical challenges, particularly in sensitive domains such as face recognition. Synthetic data generation offers a promising alternative;…

Computer Vision and Pattern Recognition · Computer Science 2025-10-27 Parsa Rahimi , Damien Teney , Sebastien Marcel

Differential privacy has become a de facto standard for releasing data in a privacy-preserving way. Creating a differentially private algorithm is a process that often starts with a noise-free (non-private) algorithm. The designer then…

Cryptography and Security · Computer Science 2021-09-16 Yuxin Wang , Zeyu Ding , Yingtai Xiao , Daniel Kifer , Danfeng Zhang

How can we release a massive volume of sensitive data while mitigating privacy risks? Privacy-preserving data synthesis enables the data holder to outsource analytical tasks to an untrusted third party. The state-of-the-art approach for…

Machine Learning · Computer Science 2022-03-08 Shun Takagi , Tsubasa Takahashi , Yang Cao , Masatoshi Yoshikawa

Deep learning models have demonstrated superior performance in several application problems, such as image classification and speech processing. However, creating a deep learning model using health record data requires addressing certain…

Machine Learning · Computer Science 2021-12-14 Amirsina Torfi , Edward A. Fox , Chandan K. Reddy

Process data with confidential information cannot be shared directly in public, which hinders the research in process data mining and analytics. Data encryption methods have been studied to protect the data, but they still may be decrypted,…

Machine Learning · Computer Science 2022-03-16 Keyi Li , Sen Yang , Travis M. Sullivan , Randall S. Burd , Ivan Marsic

Synthetic tabular data generation with differential privacy is a crucial problem to enable data sharing with formal privacy. Despite a rich history of methodological research and development, developing differentially private tabular data…

Machine Learning · Computer Science 2024-06-05 Toan V. Tran , Li Xiong

Machine Learning (ML) is accelerating progress across fields and industries, but relies on accessible and high-quality training data. Some of the most important datasets are found in biomedical and financial domains in the form of…

Machine Learning · Computer Science 2023-08-30 Gianluca Truda
‹ Prev 1 2 3 10 Next ›