English

Anonymizing Unstructured Data

Databases 2008-11-04 v2 Data Structures and Algorithms

Abstract

In this paper we consider the problem of anonymizing datasets in which each individual is associated with a set of items that constitute private information about the individual. Illustrative datasets include market-basket datasets and search engine query logs. We formalize the notion of k-anonymity for set-valued data as a variant of the k-anonymity model for traditional relational datasets. We define an optimization problem that arises from this definition of anonymity and provide O(klogk) and O(1)-approximation algorithms for the same. We demonstrate applicability of our algorithms to the America Online query log dataset.

Keywords

Cite

@article{arxiv.0810.5582,
  title  = {Anonymizing Unstructured Data},
  author = {Rajeev Motwani and Shubha U. Nabar},
  journal= {arXiv preprint arXiv:0810.5582},
  year   = {2008}
}

Comments

9 pages, 1 figure

R2 v1 2026-06-21T11:36:46.092Z