Related papers: Sampling the suffix array with minimizers

Suffix arrays with a twist

The suffix array is a classic full-text index, combining effectiveness with simplicity. We discuss three approaches aiming to improve its efficiency even more: changes to the navigation, data layout and adding extra data. In short, we show…

Data Structures and Algorithms · Computer Science 2016-07-28 Tomasz Kowalski , Szymon Grabowski , Kimmo Fredriksson , Marcin Raniszewski

Suffixient Arrays: a New Efficient Suffix Array Compression Technique

The Suffix Array is a classic text index enabling on-line pattern matching queries via simple binary search. The main drawback of the Suffix Array is that it takes linear space in the text's length, even if the text itself is extremely…

Data Structures and Algorithms · Computer Science 2025-03-19 Davide Cenzato , Lore Depuydt , Travis Gagie , Sung-Hwan Kim , Giovanni Manzini , Francisco Olivares , Nicola Prezza

New Algorithms for Position Heaps

We present several results about position heaps, a relatively new alternative to suffix trees and suffix arrays. First, we show that, if we limit the maximum length of patterns to be sought, then we can also limit the height of the heap and…

Data Structures and Algorithms · Computer Science 2013-01-15 Travis Gagie , Wing-Kai Hon , Tsung-Han Ku

Suffix sorting via matching statistics

We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a…

Data Structures and Algorithms · Computer Science 2024-04-16 Zsuzsanna Lipták , Francesco Masillo , Simon J. Puglisi

Sampled Longest Common Prefix Array

When augmented with the longest common prefix (LCP) array and some other structures, the suffix array can solve many string processing problems in optimal time and space. A compressed representation of the LCP array is also one of the main…

Data Structures and Algorithms · Computer Science 2010-06-30 Jouni Sirén

Efficient Online String Matching Based on Characters Distance Text Sampling

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. Sampled string…

Data Structures and Algorithms · Computer Science 2019-08-19 Simone Faro , Arianna Pavone , Francesco Pio Marino

Pattern Sampling for Shapelet-based Time Series Classification

Subsequence-based time series classification algorithms provide accurate and interpretable models, but training these models is extremely computation intensive. The asymptotic time complexity of subsequence-based algorithms remains a…

Machine Learning · Computer Science 2021-02-18 Atif Raza , Stefan Kramer

Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast

Sparse suffix sorting is the problem of sorting $b=o(n)$ suffixes of a string of length $n$. Efficient sparse suffix sorting algorithms have existed for more than a decade. Despite the multitude of works and their justified claims for…

Data Structures and Algorithms · Computer Science 2024-07-08 Lorraine A. K. Ayad , Grigorios Loukides , Solon P. Pissis , Hilde Verbeek

Linear pattern matching on sparse suffix trees

Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse…

Data Structures and Algorithms · Computer Science 2015-03-19 Roman Kolpakov , Gregory Kucherov , Tatiana Starikovskaya

Parallel Suffix Array Construction by Accelerated Sampling

A deterministic BSP algorithm for constructing the suffix array of a given string is presented, based on a technique which we call accelerated sampling. It runs in optimal O(n/p) local computation and communication, and requires a near…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-26 Matthew Felice Pace , Alexander Tiskin

Suffix Stripping Problem as an Optimization Problem

Stemming or suffix stripping, an important part of the modern Information Retrieval systems, is to find the root word (stem) out of a given cluster of words. Existing algorithms targeting this problem have been developed in a haphazard…

Information Retrieval · Computer Science 2013-12-25 B. P. Pande , Pawan Tamta , H. S. Dhami

Two simple full-text indexes based on the suffix array

We propose two suffix array inspired full-text indexes. One, called SA-hash, augments the suffix array with a hash table to speed up pattern searches due to significantly narrowed search interval before the binary search phase. The other,…

Data Structures and Algorithms · Computer Science 2016-05-24 Szymon Grabowski , Marcin Raniszewski

Large-Scale Pattern Search Using Reduced-Space On-Disk Suffix Arrays

The suffix array is an efficient data structure for in-memory pattern search. Suffix arrays can also be used for external-memory pattern search, via two-level structures that use an internal index to identify the correct block of suffix…

Data Structures and Algorithms · Computer Science 2013-03-27 Simon Gog , Alistair Moffat , J. Shane Culpepper , Andrew Turpin , Anthony Wirth

Fast Prefix Search in Little Space, with Applications

It has been shown in the indexing literature that there is an essential difference between prefix/range searches on the one hand, and predecessor/rank searches on the other hand, in that the former provably allows faster query resolution.…

Data Structures and Algorithms · Computer Science 2018-04-16 Djamal Belazzougui , Paolo Boldi , Rasmus Pagh , Sebastiano Vigna

Memory-Efficient Sampling for Minimax Distance Measures

Minimax distance measure extracts the underlying patterns and manifolds in an unsupervised manner. The existing methods require a quadratic memory with respect to the number of objects. In this paper, we investigate efficient sampling…

Machine Learning · Computer Science 2020-05-27 Fazeleh Sadat Hoseini , Morteza Haghir Chehreghani

On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with $p$ processors. Given a static text of length $n$, we first show how to compute the suffix array interval of a given…

Data Structures and Algorithms · Computer Science 2016-06-09 Johannes Fischer , Dominik Köppl , Florian Kurpicz

Compressed Spaced Suffix Arrays

Spaced seeds are important tools for similarity search in bioinformatics, and using several seeds together often significantly improves their performance. With existing approaches, however, for each seed we keep a separate linear-size data…

Data Structures and Algorithms · Computer Science 2014-03-11 Travis Gagie , Giovanni Manzini , Daniel Valenzuela

Sampling in the Analysis Transform Domain

Many signal and image processing applications have benefited remarkably from the fact that the underlying signals reside in a low dimensional subspace. One of the main models for such a low dimensionality is the sparsity one. Within this…

Information Theory · Computer Science 2015-03-25 Raja Giryes

Efficient repeat finding via suffix arrays

We solve the problem of finding interspersed maximal repeats using a suffix array construction. As it is well known, all the functionality of suffix trees can be handled by suffix arrays, gaining practicality. Our solution improves the…

Data Structures and Algorithms · Computer Science 2013-04-03 Veronica Becher , Alejandro Deymonnaz , Pablo Ariel Heiber

Testing Suffixient Sets

Suffixient sets are a novel prefix array (PA) compression technique based on subsampling PA (rather than compressing the entire array like previous techniques used to do): by storing very few entries of PA (in fact, a compressed number of…

Data Structures and Algorithms · Computer Science 2025-06-11 Davide Cenzato , Francisco Olivares , Nicola Prezza