Improved Compressed String Dictionaries

Nieves R. Brisaboa; Ana Cerdeira-Pena; Guillermo de Bernardo; Gonzalo Navarro

doi:10.1145/3357384.3357972

Improved Compressed String Dictionaries

Data Structures and Algorithms 2019-11-20 v1

Authors: Nieves R. Brisaboa , Ana Cerdeira-Pena , Guillermo de Bernardo , Gonzalo Navarro

View on arXiv ↗ PDF ↗ DOI ↗

Abstract

We introduce a new family of compressed data structures to efficiently store and query large string dictionaries in main memory. Our main technique is a combination of hierarchical Front-coding with ideas from longest-common-prefix computation in suffix arrays. Our data structures yield relevant space-time tradeoffs in real-world dictionaries. We focus on two domains where string dictionaries are extensively used and efficient compression is required: URL collections, a key element in Web graphs and applications such as Web mining; and collections of URIs and literals, the basic components of RDF datasets. Our experiments show that our data structures achieve better compression than the state-of-the-art alternatives while providing very competitive query times.

Keywords

source coding information retrieval succinct data structure

Cite

@article{arxiv.1911.08372,
  title  = {Improved Compressed String Dictionaries},
  author = {Nieves R. Brisaboa and Ana Cerdeira-Pena and Guillermo de Bernardo and Gonzalo Navarro},
  journal= {arXiv preprint arXiv:1911.08372},
  year   = {2019}
}

Comments

This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 690941

Improved Compressed String Dictionaries

Abstract

Keywords

Cite

Comments

Related papers