Related papers: A Simple Algorithm for Computing the Document Arra…

An Elegant Algorithm for the Construction of Suffix Arrays

The suffix array is a data structure that finds numerous applications in string processing problems for both linguistic texts and biological data. It has been introduced as a memory efficient alternative for suffix trees. The suffix array…

Data Structures and Algorithms · Computer Science 2013-07-05 Sanguthevar Rajasekaran , Marius Nicolae

Suffix sorting via matching statistics

We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a…

Data Structures and Algorithms · Computer Science 2024-04-16 Zsuzsanna Lipták , Francesco Masillo , Simon J. Puglisi

Inducing the Lyndon Array

In this paper we propose a variant of the induced suffix sorting algorithm by Nong (TOIS, 2013) that computes simultaneously the Lyndon array and the suffix array of a text in $O(n)$ time using $\sigma + O(1)$ words of working space, where…

Data Structures and Algorithms · Computer Science 2020-09-10 Felipe A. Louza , Sabrina Mantaci , Giovanni Manzini , Marinella Sciortino , Guilherme P. Telles

Fast, Small, and Simple Document Listing on Repetitive Text Collections

Document listing on string collections is the task of finding all documents where a pattern appears. It is regarded as the most fundamental document retrieval problem, and is useful in various applications. Many of the fastest-growing…

Data Structures and Algorithms · Computer Science 2019-02-21 Dustin Cobas , Gonzalo Navarro

Computing All Distinct Squares in Linear Time for Integer Alphabets

Given a string on an integer alphabet, we present an algorithm that computes the set of all distinct squares belonging to this string in time linear to the string length. As an application, we show how to compute the tree topology of the…

Data Structures and Algorithms · Computer Science 2017-02-21 Hideo Bannai , Shunsuke Inenaga , Dominik Köppl

Algorithms to Compute the Lyndon Array

We first describe three algorithms for computing the Lyndon array that have been suggested in the literature, but for which no structured exposition has been given. Two of these algorithms execute in quadratic time in the worst case, the…

Data Structures and Algorithms · Computer Science 2016-05-31 Frantisek Franek , A. S. M. Sohidull Islam , M. Sohel Rahman , W. F. Smyth

Optimal In-Place Suffix Sorting

The suffix array is a fundamental data structure for many applications that involve string searching and data compression. Designing time/space-efficient suffix array construction algorithms has attracted significant attention and…

Data Structures and Algorithms · Computer Science 2018-11-12 Zhize Li , Jian Li , Hongwei Huo

Computing Covers Using Prefix Tables

An \emph{indeterminate string} $x = x[1..n]$ on an alphabet $\Sigma$ is a sequence of nonempty subsets of $\Sigma$; $x$ is said to be \emph{regular} if every subset is of size one. A proper substring $u$ of regular $x$ is said to be a…

Data Structures and Algorithms · Computer Science 2015-03-02 Ali Alatabbi , M. Sohel Rahman , W. F. Smyth

Parallel Suffix Array Construction by Accelerated Sampling

A deterministic BSP algorithm for constructing the suffix array of a given string is presented, based on a technique which we call accelerated sampling. It runs in optimal O(n/p) local computation and communication, and requires a near…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-02-26 Matthew Felice Pace , Alexander Tiskin

Linear Time Inference of Strings from Cover Arrays using a Binary Alphabet

Covers being one of the most popular form of regularities in strings, have drawn much attention over time. In this paper, we focus on the problem of linear time inference of strings from cover arrays using the least sized alphabet possible.…

Data Structures and Algorithms · Computer Science 2011-08-30 Tanaeem M. Moosa , Sumaiya Nazeen , M. Sohel Rahman , Rezwana Reaz

A comparison of two suffix tree-based document clustering algorithms

Document clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW). Recently, focuses in this domain shifted from traditional…

Information Retrieval · Computer Science 2012-01-11 Muhammad Rafi , M. Maujood , M. M. Fazal , S. M. Ali

Linear Index for Logarithmic Search-Time for any String under any Internal Node in Suffix Trees

Suffix trees are key and efficient data structure for solving string problems. A suffix tree is a compressed trie containing all the suffixes of a given text of length $n$ with a linear construction cost. In this work, we introduce an…

Data Structures and Algorithms · Computer Science 2024-06-04 Anas Al-okaily

Lightweight LCP-Array Construction in Linear Time

The suffix tree is a very important data structure in string processing, but it suffers from a huge space consumption. In large-scale applications, compressed suffix trees (CSTs) are therefore used instead. A CST consists of three…

Data Structures and Algorithms · Computer Science 2010-12-21 Simon Gog , Enno Ohlebusch

Scalable Construction of Text Indexes

The suffix array is the key to efficient solutions for myriads of string processing problems in different applications domains, like data compression, data mining, or Bioinformatics. With the rapid growth of available data, suffix array…

Data Structures and Algorithms · Computer Science 2016-10-11 Timo Bingmann , Simon Gog , Florian Kurpicz

Linear-time Computation of Minimal Absent Words Using Suffix Array

An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their…

Data Structures and Algorithms · Computer Science 2014-07-01 Carl Barton , Alice Heliou , Laurent Mouchard , Solon P. Pissis

Quantum implementation of circulant matrices and its use in quantum string processing

String problems in general can be solved faster by using special data structures such as suffixes in many cases structured as trees and arrays. In this paper, we show that suffixes used in those data structures can be obtained by using…

Quantum Physics · Physics 2023-02-21 Ammar Daskin

Suffix tree-based linear algorithms for multiple prefixes, single suffix counting and listing problems

Given two strings $T$ and $S$ and a set of strings $P$, for each string $p \in P$, consider the unique substrings of $T$ that have $p$ as their prefix and $S$ as their suffix. Two problems then come to mind; the first problem being the…

Data Structures and Algorithms · Computer Science 2022-04-19 Laurentius Leonard , Ken Tanaka

On-line construction of position heaps

We propose a simple linear-time on-line algorithm for constructing a position heap for a string [Ehrenfeucht et al, 2011]. Our definition of position heap differs slightly from the one proposed in [Ehrenfeucht et al, 2011] in that it…

Data Structures and Algorithms · Computer Science 2015-03-19 Gregory Kucherov

A note on the complexity of addition

We show that the sum of a sequence of integers can be computed in linear time on a Turing machine. In particular, the most obvious algorithm for this problem, which appears to require quadratic time due to carry propagation, actually runs…

Computational Complexity · Computer Science 2023-06-16 Emil Jeřábek

Sampling the suffix array with minimizers

Sampling (evenly) the suffixes from the suffix array is an old idea trading the pattern search time for reduced index space. A few years ago Claude et al. showed an alphabet sampling scheme allowing for more efficient pattern searches…

Data Structures and Algorithms · Computer Science 2014-12-04 Szymon Grabowski , Marcin Raniszewski