Related papers: Privacy Amplification by Missing Data
Recent research in differential privacy demonstrated that (sub)sampling can amplify the level of protection. For example, for $\epsilon$-differential privacy and simple random sampling with sampling rate $r$, the actual privacy guarantee is…
Differential privacy comes equipped with multiple analytical tools for the design of private data analyses. One important tool is the so-called "privacy amplification by subsampling" principle, which ensures that a differentially private…
Differential privacy provides a theoretical framework for processing a dataset about $n$ users, in a way that the output reveals a minimal information about any single user. Such notion of privacy is usually ensured by noise-adding…
This chapter is meant to be part of the book "Differential Privacy for Artificial Intelligence Applications." We give an introduction to the most important property of differential privacy -- composition: running multiple independent…
The literature on differential privacy almost invariably assumes that the data to be analyzed are fully observed. In most practical applications this is an unrealistic assumption. A popular strategy to address this problem is imputation, in…
Due to successful applications of data analysis technologies in many fields, various institutions have accumulated a large amount of data to improve their services. As the speed of data collection has increased dramatically over the last…
Differential privacy is a popular privacy model within the research community because of the strong privacy guarantee it offers, namely that the presence or absence of any individual in a data set does not significantly influence the…
Differential privacy is known to protect against threats to validity incurred due to adaptive, or exploratory, data analysis -- even when the analyst adversarially searches for a statistical estimate that diverges from the true value of the…
Despite several works that succeed in generating synthetic data with differential privacy (DP) guarantees, they are inadequate for generating high-quality synthetic data when the input data has missing values. In this work, we formalize the…
While pursuing better utility by discovering knowledge from the data, individual's privacy may be compromised during an analysis. To that end, differential privacy has been widely recognized as the state-of-the-art privacy notion. By…
In a technical treatment, this article establishes the necessity of transparent privacy for drawing unbiased statistical inference for a wide range of scientific questions. Transparency is a distinct feature enjoyed by differential privacy:…
Synthetic data inherits the differential privacy guarantees of the model used to generate it. Additionally, synthetic data may benefit from privacy amplification when the generative model is kept hidden. While empirical studies suggest this…
Data engineering often requires accuracy (utility) constraints on results, posing significant challenges in designing differentially private (DP) mechanisms, particularly under stringent privacy parameter $\epsilon$. In this paper, we…
We study privacy amplification by synthetic data release, a phenomenon in which differential privacy guarantees are improved by releasing only synthetic data rather than the private generative model itself. Recent work by Pierquin et al.…
Sequential querying of differentially private mechanisms degrades the overall privacy level. In this paper, we answer the fundamental question of characterizing the level of overall privacy degradation as a function of the number of queries…
When applying differential privacy to sensitive data, we can often improve performance using external information such as other sensitive data, public data, or human priors. We propose to use the learning-augmented algorithms (or algorithms…
Balancing privacy and accuracy is a major challenge in designing differentially private machine learning algorithms. One way to improve this tradeoff for free is to leverage the noise in common data operations that already use randomness.…
Big data is a term used for a very large data sets that have many difficulties in storing and processing the data. Analysis this much amount of data will lead to information loss. The main goal of this paper is to share data in a way that…
Sensitive statistics are often collected across sets of users, with repeated collection of reports done over time. For example, trends in users' private preferences or software usage may be monitored via such reports. We study the…
Differentially private mechanisms protect privacy by introducing additional randomness into the data. Restricting access to only the privatized data makes it challenging to perform valid statistical inference on parameters underlying the…