English

Privacy-Preserving Multi-Document Summarization

Information Retrieval 2015-08-07 v1 Computation and Language Cryptography and Security

Abstract

State-of-the-art extractive multi-document summarization systems are usually designed without any concern about privacy issues, meaning that all documents are open to third parties. In this paper we propose a privacy-preserving approach to multi-document summarization. Our approach enables other parties to obtain summaries without learning anything else about the original documents' content. We use a hashing scheme known as Secure Binary Embeddings to convert documents representation containing key phrases and bag-of-words into bit strings, allowing the computation of approximate distances, instead of exact ones. Our experiments indicate that our system yields similar results to its non-private counterpart on standard multi-document evaluation datasets.

Keywords

Cite

@article{arxiv.1508.01420,
  title  = {Privacy-Preserving Multi-Document Summarization},
  author = {Luís Marujo and José Portêlo and Wang Ling and David Martins de Matos and João P. Neto and Anatole Gershman and Jaime Carbonell and Isabel Trancoso and Bhiksha Raj},
  journal= {arXiv preprint arXiv:1508.01420},
  year   = {2015}
}

Comments

4 pages, In Proceedings of 2nd ACM SIGIR Workshop on Privacy-Preserving Information Retrieval, August 2015. arXiv admin note: text overlap with arXiv:1407.5416

R2 v1 2026-06-22T10:27:54.434Z