Related papers: An Open Source Python Library for Anonymizing Sens…

pyCANON: A Python library to check the level of anonymity of a dataset

Openly sharing data with sensitive attributes and privacy restrictions is a challenging task. In this document we present the implementation of pyCANON, a Python library and command line interface (CLI) to check and assess the level of…

Cryptography and Security · Computer Science 2023-05-15 Judith Sáinz-Pardo Díaz , Álvaro López García

Diversifying Anonymized Data with Diversity Constraints

Recently introduced privacy legislation has aimed to restrict and control the amount of personal data published by companies and shared to third parties. Much of this real data is not only sensitive requiring anonymization, but also…

Databases · Computer Science 2020-07-20 Mostafa Milani , Yu Huang , Fei Chiang

Techniques d'anonymisation tabulaire : concepts et mise en oeuvre

In this document, we present a state of the art of anonymization techniques for classical tabular datasets. This article is geared towards a general public having some knowledge of mathematics and computer science, but with no need for…

Cryptography and Security · Computer Science 2020-01-09 Benjamin Nguyen , Claude Castelluccia

Anonymity-washing

Anonymization is a foundational principle of data privacy regulation, yet its practical application remains riddled with ambiguity and inconsistency. This paper introduces the concept of anonymity-washing -- the misrepresentation of the…

Cryptography and Security · Computer Science 2025-08-27 Szivia Lestyán , William Letrone , Ludovica Robustelli , Gergely Biczók

Privacy- and Utility-Preserving NLP with Anonymized Data: A case study of Pseudonymization

This work investigates the effectiveness of different pseudonymization techniques, ranging from rule-based substitutions to using pre-trained Large Language Models (LLMs), on a variety of datasets and models used for two widely used NLP…

Computation and Language · Computer Science 2023-06-12 Oleksandr Yermilov , Vipul Raheja , Artem Chernodub

A Sensitive Attribute based Clustering Method for kanonymization

In medical organizations large amount of personal data are collected and analyzed by the data miner or researcher, for further perusal. However, the data collected may contain sensitive information such as specific disease of a patient and…

Cryptography and Security · Computer Science 2012-03-19 Pawan R Bhaladhare , Devesh Jinwala

Diffprivlib: The IBM Differential Privacy Library

Since its conception in 2006, differential privacy has emerged as the de-facto standard in data privacy, owing to its robust mathematical guarantees, generalised applicability and rich body of literature. Over the years, researchers have…

Cryptography and Security · Computer Science 2019-07-05 Naoise Holohan , Stefano Braghin , Pól Mac Aonghusa , Killian Levacher

Tau-Eval: A Unified Evaluation Framework for Useful and Private Text Anonymization

Text anonymization is the process of removing or obfuscating information from textual data to protect the privacy of individuals. This process inherently involves a complex trade-off between privacy protection and information preservation,…

Computation and Language · Computer Science 2025-09-23 Gabriel Loiseau , Damien Sileo , Damien Riquet , Maxime Meyer , Marc Tommasi

Towards Utility-driven Anonymization of Transactions

Publishing person-specific transactions in an anonymous form is increasingly required by organizations. Recent approaches ensure that potentially identifying information (e.g., a set of diagnosis codes) cannot be used to link published…

Databases · Computer Science 2010-01-26 Grigorios Loukides , Aris Gkoulalas-Divanis , Bradley Malin

Secure k-Anonymization over Encrypted Databases

Data protection algorithms are becoming increasingly important to support modern business needs for facilitating data sharing and data monetization. Anonymization is an important step before data sharing. Several organizations leverage on…

Cryptography and Security · Computer Science 2021-08-11 Manish Kesarwani , Akshar Kaul , Stefano Braghin , Naoise Holohan , Spiros Antonatos

Enforcing transparent access to private content in social networks by means of automatic sanitization

Social networks have become an essential meeting point for millions of individuals willing to publish and consume huge quantities of heterogeneous information. Some studies have shown that the data published in these platforms may contain…

Cryptography and Security · Computer Science 2016-07-05 Alexandre Viejo , David Sánchez

A Multi-Objective Degree-Based Network Anonymization Approach

Enormous amounts of data collected from social networks or other online platforms are being published for the sake of statistics, marketing, and research, among other objectives. The consequent privacy and data security concerns have…

Cryptography and Security · Computer Science 2021-12-24 Ola N. Halawi , Faisal N. Abu-Khzam

Learning from Anonymized and Incomplete Tabular Data

User-driven privacy allows individuals to control whether and at what granularity their data is shared, leading to datasets that mix original, generalized, and missing values within the same records and attributes. While such…

Machine Learning · Computer Science 2026-02-03 Lucas Lange , Adrian Böttinger , Victor Christen , Anushka Vidanage , Peter Christen , Erhard Rahm

Properties of Effective Information Anonymity Regulations

A firm seeks to analyze a dataset and to release the results. The dataset contains information about individual people, and the firm is subject to some regulation that forbids the release of the dataset itself. The regulation also imposes…

Computers and Society · Computer Science 2024-08-28 Aloni Cohen , Micah Altman , Francesca Falzon , Evangelina Anna Markatou , Kobbi Nissim

Transparent Anonymization: Thwarting Adversaries Who Know the Algorithm

Numerous generalization techniques have been proposed for privacy preserving data publishing. Most existing techniques, however, implicitly assume that the adversary knows little about the anonymization algorithm adopted by the data…

Databases · Computer Science 2010-03-29 Xiaokui Xiao , Yufei Tao , Nick Koudas

A Review of Anonymization for Healthcare Data

Mining health data can lead to faster medical decisions, improvement in the quality of treatment, disease prevention, reduced cost, and it drives innovative solutions within the healthcare sector. However, health data is highly sensitive…

Cryptography and Security · Computer Science 2022-04-28 Iyiola E. Olatunji , Jens Rauch , Matthias Katzensteiner , Megha Khosla

Practical Aspect of Privacy-Preserving Data Publishing in Process Mining

Process mining techniques such as process discovery and conformance checking provide insights into actual processes by analyzing event data that are widely available in information systems. These data are very valuable, but often contain…

Cryptography and Security · Computer Science 2020-09-25 Majid Rafiei , Wil M. P. van der Aalst

PyRDM: A Python-based library for automating the management and online publication of scientific software and data

The recomputability and reproducibility of results from scientific software requires access to both the source code and all associated input and output data. However, the full collection of these resources often does not accompany the key…

Computational Engineering, Finance, and Science · Computer Science 2015-12-24 Christian T. Jacobs , Alexandros Avdis , Gerard J. Gorman , Matthew D. Piggott

The Boundary Between Privacy and Utility in Data Anonymization

We consider the privacy problem in data publishing: given a relation I containing sensitive information 'anonymize' it to obtain a view V such that, on one hand attackers cannot learn any sensitive information from V, and on the other hand…

Databases · Computer Science 2007-05-23 Vibhor Rastogi , Dan Suciu , Sungho Hong

Non-Interactive Differential Privacy: a Survey

OpenData movement around the globe is demanding more access to information which lies locked in public or private servers. As recently reported by a McKinsey publication, this data has significant economic value, yet its release has…

Databases · Computer Science 2012-05-15 David Leoni