Related papers: Improved Compressed String Dictionaries

Compressed String Dictionaries

The problem of storing a set of strings --- a string dictionary --- in compact form appears naturally in many cases. While classically it has represented a small part of the whole data to be processed (e.g., for Natural Language processing…

Data Structures and Algorithms · Computer Science 2011-01-31 Nieves R. Brisaboa , Rodrigo Cánovas , Miguel A. Martínez-Prieto , Gonzalo Navarro

Relative Suffix Trees

Suffix trees are one of the most versatile data structures in stringology, with many applications in bioinformatics. Their main drawback is their size, which can be tens of times larger than the input sequence. Much effort has been put into…

Data Structures and Algorithms · Computer Science 2017-12-18 Andrea Farruggia , Travis Gagie , Gonzalo Navarro , Simon J. Puglisi , Jouni Sirén

Fast Prefix Search in Little Space, with Applications

It has been shown in the indexing literature that there is an essential difference between prefix/range searches on the one hand, and predecessor/rank searches on the other hand, in that the former provably allows faster query resolution.…

Data Structures and Algorithms · Computer Science 2018-04-16 Djamal Belazzougui , Paolo Boldi , Rasmus Pagh , Sebastiano Vigna

Faster Repetition-Aware Compressed Suffix Trees based on Block Trees

Suffix trees are a fundamental data structure in stringology, but their space usage, though linear, is an important problem for its applications. We design and implement a new compressed suffix tree targeted to highly repetitive texts, such…

Data Structures and Algorithms · Computer Science 2019-02-12 Manuel Cáceres , Gonzalo Navarro

Indexing Highly Repetitive String Collections

Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through…

Data Structures and Algorithms · Computer Science 2022-11-28 Gonzalo Navarro

Fast Compressed Tries through Path Decompositions

Tries are popular data structures for storing a set of strings, where common prefixes are represented by common root-to-node paths. Over fifty years of usage have produced many variants and implementations to overcome some of their…

Data Structures and Algorithms · Computer Science 2011-12-06 Roberto Grossi , Giuseppe Ottaviano

A Novel Approach to Compress Centralized Text Data using Indexed Dictionary

Data compression is very important feature in terms of saving the memory space. In this proposal, an indexed dictionary based compression is used for text data, where the word's reference in dictionary is used for compression. This approach…

Other Computer Science · Computer Science 2015-12-23 Vivek Dimri , Prof. Ranjit Biswas

Optimal Random Access and Conditional Lower Bounds for 2D Compressed Strings

Compressed indexing is a powerful technique that enables efficient querying over data stored in compressed form, significantly reducing memory usage and often accelerating computation. While extensive progress has been made for…

Data Structures and Algorithms · Computer Science 2025-10-23 Rajat De , Dominik Kempa

Engineering Rank/Select Data Structures for Large-Alphabet Strings

Large-alphabet strings are common in scenarios such as information retrieval and natural-language processing. The efficient storage and processing of such strings usually introduces several challenges that are not witnessed in…

Data Structures and Algorithms · Computer Science 2024-05-03 Diego Arroyuelo , Gabriel Carmona , Héctor Larrañaga , Francisco Riveros , Carlos Eugenio Rojas-Morales , Erick Sepúlveda

The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space

An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of…

Data Structures and Algorithms · Computer Science 2012-04-17 Roberto Grossi , Giuseppe Ottaviano

String Indexing with Compressed Patterns

Given a string $S$ of length $n$, the classic string indexing problem is to preprocess $S$ into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is…

Data Structures and Algorithms · Computer Science 2024-02-15 Philip Bille , Inge Li Gørtz , Teresa Anna Steiner

Compressed Indexes for Fast Search of Semantic Data

The sheer increase in volume of RDF data demands efficient solutions for the triple indexing problem, that is devising a compressed data structure to compactly represent RDF triples by guaranteeing, at the same time, fast pattern matching…

Information Retrieval · Computer Science 2022-02-08 Raffaele Perego , Giulio Ermanno Pibiri , Rossano Venturini

Document Retrieval on Repetitive String Collections

Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their…

Information Retrieval · Computer Science 2017-05-22 Travis Gagie , Aleksi Hartikainen , Kalle Karhu , Juha Kärkkäinen , Gonzalo Navarro , Simon J. Puglisi , Jouni Sirén

Extension of Dictionary-Based Compression Algorithms for the Quantitative Visualization of Patterns from Log Files

Many services today massively and continuously produce log files of different and varying formats. These logs are important since they contain information about the application activities, which is necessary for improvements by analyzing…

Information Retrieval · Computer Science 2023-04-11 Igor Cherepanov , Jonathan Geraldi Joewono , Arjan Kuijper , Jörn Kohlhammer

Fast, Small and Exact: Infinite-order Language Modelling with Compressed Suffix Trees

Efficient methods for storing and querying are critical for scaling high-order n-gram language models to large corpora. We propose a language model based on compressed suffix trees, a representation that is highly compact and can be easily…

Computation and Language · Computer Science 2016-08-17 Ehsan Shareghi , Matthias Petri , Gholamreza Haffari , Trevor Cohn

Dynamic Path-Decomposed Tries

A keyword dictionary is an associative array whose keys are strings. Recent applications handling massive keyword dictionaries in main memory have a need for a space-efficient implementation. When limited to static applications, there are a…

Data Structures and Algorithms · Computer Science 2020-07-23 Shunsuke Kanda , Dominik Köppl , Yasuo Tabei , Kazuhiro Morita , Masao Fuketa

Learned Data Compression: Challenges and Opportunities for the Future

Compressing integer keys is a fundamental operation among multiple communities, such as database management (DB), information retrieval (IR), and high-performance computing (HPC). Recent advances in \emph{learned indexes} have inspired the…

Databases · Computer Science 2024-12-17 Qiyu Liu , Siyuan Han , Jianwei Liao , Jin Li , Jingshu Peng , Jun Du , Lei Chen

Techniques for Inverted Index Compression

The data structure at the core of large-scale search engines is the inverted index, which is essentially a collection of sorted integer sequences called inverted lists. Because of the many documents indexed by such engines and stringent…

Information Retrieval · Computer Science 2022-02-08 Giulio Ermanno Pibiri , Rossano Venturini

Compressing the Data Densely by New Geflochtener to Accelerate Web

At the present scenario of the internet, there exist many optimization techniques to improve the Web speed but almost expensive in terms of bandwidth. So after a long investigation on different techniques to compress the data without any…

Information Theory · Computer Science 2014-05-20 Hemant Kumar Saini , Satpal Singh Kushwaha , C. Rama Krishna

Grammar Compressed Sequences with Rank/Select Support

Sequence representations supporting not only direct access to their symbols, but also rank/select operations, are a fundamental building block in many compressed data structures. Several recent applications need to represent highly…

Data Structures and Algorithms · Computer Science 2019-11-25 Alberto Ordóñez , Gonzalo Navarro , Nieves R. Brisaboa