Related papers: Diffix-Birch: Extending Diffix-Aspen
Anonymized data is highly valuable to both businesses and researchers. A large body of research has however shown the strong limits of the de-identification release-and-forget model, where data is anonymized and shared. This has led to the…
Historically, strong data anonymization requires substantial domain expertise and custom design for the given data set and use case. Diffix is an anonymization framework designed to make strong data anonymization available to non-experts.…
Data mining deals with automatic extraction of previously unknown patterns from large amounts of data. Organizations all over the world handle large amounts of data and are dependent on mining gigantic data sets for expansion of their…
We describe and evaluate an attack that reconstructs the histogram of any target attribute of a sensitive dataset which can only be queried through a specific class of real-world privacy-preserving algorithms which we call bounded…
Differential privacy is achieved by the introduction of Laplacian noise in the response to a query, establishing a precise trade-off between the level of differential privacy and the accuracy of the database response (via the amount of…
The re-identification or de-anonymization of users from anonymized data through matching with publicly available correlated user data has raised privacy concerns, leading to the complementary measure of obfuscation in addition to…
Differential privacy (DP) is the state-of-the-art and rigorous notion of privacy for answering aggregate database queries while preserving the privacy of sensitive information in the data. In today's era of data analysis, however, it poses…
Preserving privacy of continuous and/or high-dimensional data such as images, videos and audios, can be challenging with syntactic anonymization methods which are designed for discrete attributes. Differential privacy, which provides a more…
Data obfuscation deals with the problem of masking a data-set in such a way that the utility of the data is maximized while minimizing the risk of the disclosure of sensitive information. To protect data we address some ways that may as…
Firms and statistical agencies must protect the privacy of the individuals whose data they collect, analyze, and publish. Increasingly, these organizations do so by using publication mechanisms that satisfy differential privacy. We consider…
Latent diffusion models can be used as a powerful augmentation method to artificially extend datasets for enhanced training. To the human eye, these augmented images look very different to the originals. Previous work has suggested to use…
Differential privacy is achieved by the introduction of Laplacian noise in the response to a query, establishing a precise trade-off between the level of differential privacy and the accuracy of the database response (via the amount of…
Effective information disclosure in the context of databases with a large conceptual schema is known to be a non-trivial problem. In particular the formulation of ad-hoc queries is a major problem in such contexts. Existing approaches for…
This is a paper about private data analysis, in which a trusted curator holding a confidential database responds to real vector-valued queries. A common approach to ensuring privacy for the database elements is to add appropriately…
Differential Privacy (DP) considers a scenario in which an adversary has almost complete information about the entries of a database. This worst-case assumption is likely to overestimate the privacy threat faced by an individual in…
Differential privacy is a modern approach in privacy-preserving data analysis to control the amount of information that can be inferred about an individual by querying a database. The most common techniques are based on the introduction of…
In 2011 Bhaskar et al. pointed out that in many cases one can ensure sufficient level of privacy without adding noise by utilizing adversarial uncertainty. Informally speaking, this observation comes from the fact that if at least a part of…
This paper is motivated by applications of a Census Bureau interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual. The released information can be the…
Linear queries can be submitted to a server containing private data. The server provides a response to the queries systematically corrupted using an additive noise to preserve the privacy of those whose data is stored on the server. The…
Conflicts of interest often arise between data sources and their users regarding how the users' information needs should be interpreted by the data source. For example, an online product search might be biased towards presenting certain…