English
Related papers

Related papers: Differentially Private Data Generation with Missin…

200 papers

Privacy concerns have attracted increasing attention in data-driven products due to the tendency of machine learning models to memorize sensitive training data. Generating synthetic versions of such data with a formal privacy guarantee,…

Computation and Language · Computer Science 2023-07-19 Xiang Yue , Huseyin A. Inan , Xuechen Li , Girish Kumar , Julia McAnallen , Hoda Shajari , Huan Sun , David Levitan , Robert Sim

High quality data is needed to unlock the full potential of AI for end users. However finding new sources of such data is getting harder: most publicly-available human generated data will soon have been used. Additionally, publicly…

Data privacy is a core tenet of responsible computing, and in the United States, differential privacy (DP) is the dominant technical operationalization of privacy-preserving data analysis. With this study, we qualitatively examine one class…

Human-Computer Interaction · Computer Science 2024-12-18 Lucas Rosenblatt , Bill Howe , Julia Stoyanovich

Privacy-preserving synthetic data offers a promising solution to harness segregated data in high-stakes domains where information is compartmentalized for regulatory, privacy, or institutional reasons. This survey provides a comprehensive…

Cryptography and Security · Computer Science 2025-03-28 Viktor Schlegel , Anil A Bharath , Zilong Zhao , Kevin Yee

Existing differentially private (DP) synthetic data generation mechanisms typically assume a single-source table. In practice, data is often distributed across multiple tables with relationships across tables. In this paper, we introduce…

Machine Learning · Computer Science 2025-01-22 Kaveh Alimohammadi , Hao Wang , Ojas Gulati , Akash Srivastava , Navid Azizan

Differential privacy (DP) provides a principled approach to synthesizing data (e.g., loads) from real-world power systems while limiting the exposure of sensitive information. However, adversaries may exploit synthetic data to calibrate…

Systems and Control · Electrical Eng. & Systems 2025-05-05 Shengyang Wu , Vladimir Dvorkin

Background: Synthetic data has been proposed as a solution for sharing anonymized versions of sensitive biomedical datasets. Ideally, synthetic data should preserve the structure and statistical properties of the original data, while…

Machine Learning · Computer Science 2024-10-24 Ileana Montoya Perez , Parisa Movahedi , Valtteri Nieminen , Antti Airola , Tapio Pahikkala

Generative AI offers transformative potential for high-stakes domains such as healthcare and finance, yet privacy and regulatory barriers hinder the use of real-world data. To address this, differentially private synthetic data generation…

Generative models trained with Differential Privacy (DP) are becoming increasingly prominent in the creation of synthetic data for downstream applications. Existing literature, however, primarily focuses on basic benchmarking datasets and…

Cryptography and Security · Computer Science 2024-02-08 Dingfan Chen , Marie Oestreich , Tejumade Afonja , Raouf Kerkouche , Matthias Becker , Mario Fritz

We address the challenge of ensuring differential privacy (DP) guarantees in training deep retrieval systems. Training these systems often involves the use of contrastive-style losses, which are typically non-per-example decomposable,…

Computation and Language · Computer Science 2024-05-24 Aldo Gael Carranza , Rezsa Farahani , Natalia Ponomareva , Alex Kurakin , Matthew Jagielski , Milad Nasr

We propose a method for the release of differentially private synthetic datasets. In many contexts, data contain sensitive values which cannot be released in their original form in order to protect individuals' privacy. Synthetic data is a…

Methodology · Statistics 2018-05-25 Joshua Snoke , Aleksandra Slavković

Differentially private (DP) synthetic data generation is a practical method for improving access to data as a means to encourage productive partnerships. One issue inherent to DP is that the "privacy budget" is generally "spent" evenly…

Machine Learning · Computer Science 2022-08-11 Lucas Rosenblatt , Joshua Allen , Julia Stoyanovich

Privacy Preserving Synthetic Data Generation (PP-SDG) has emerged to produce synthetic datasets from personal data while maintaining privacy and utility. Differential privacy (DP) is the property of a PP-SDG mechanism that establishes how…

Cryptography and Security · Computer Science 2025-07-23 Frederik Marinus Trudslev , Matteo Lissandrini , Juan Manuel Rodriguez , Martin Bøgsted , Daniele Dell'Aglio

Techniques to deliver privacy-preserving synthetic datasets take a sensitive dataset as input and produce a similar dataset as output while maintaining differential privacy. These approaches have the potential to improve data sharing and…

Databases · Computer Science 2018-08-24 Luke Rodriguez , Bill Howe

Diferentially private (DP) synthetic datasets are a powerful approach for training machine learning models while respecting the privacy of individual data providers. The effect of DP on the fairness of the resulting trained models is not…

Machine Learning · Statistics 2021-06-21 Mayana Pereira , Meghana Kshirsagar , Sumit Mukherjee , Rahul Dodhia , Juan Lavista Ferres

Privacy-preserving data publication, including synthetic data sharing, often experiences trade-offs between privacy and utility. Synthetic data is generally more effective than data anonymization in balancing this trade-off, however, not…

Machine Learning · Computer Science 2025-06-03 Yan Zhou , Bradley Malin , Murat Kantarcioglu

Differential privacy (DP) has been accepted as a rigorous criterion for measuring the privacy protection offered by random mechanisms used to obtain statistics or, as we will study here, synthetic datasets from confidential data. Methods to…

Methodology · Statistics 2024-05-09 Leila Nombo , Anne-Sophie Charest

The difficulty of anonymizing text data hinders the development and deployment of NLP in high-stakes domains that involve private data, such as healthcare and social services. Poorly anonymized sensitive data cannot be easily shared with…

Computation and Language · Computer Science 2024-10-14 Krithika Ramesh , Nupoor Gandhi , Pulkit Madaan , Lisa Bauer , Charith Peris , Anjalie Field

The need to analyze sensitive data, such as medical records or financial data, has created a critical research challenge in recent years. In this paper, we adopt the framework of differential privacy, and explore mechanisms for generating…

Cryptography and Security · Computer Science 2024-05-09 Nikolija Bojkovic , Po-Ling Loh

Creation of synthetic data models has represented a significant advancement across diverse scientific fields, but this technology also brings important privacy considerations for users. This work focuses on enhancing a non-parametric…

‹ Prev 1 2 3 10 Next ›