Related papers: Deterministic Indexing for Packed Strings
It is widely assumed that $O(m+\lg \sigma)$ is the best one can do for finding a pattern of length $m$ in a compacted trie storing strings over an alphabet of size $\sigma$, if one insists on linear-size data structures and deterministic…
Given a string $S$ over an alphabet $\Sigma$, the 'string indexing problem' is to preprocess $S$ to subsequently support efficient pattern matching queries, i.e., given a pattern string $P$ report all the occurrences of $P$ in $S$. In this…
We introduce a compressed suffix array representation that, on a text $T$ of length $n$ over an alphabet of size $\sigma$, can be built in $O(n)$ deterministic time, within $O(n\log\sigma)$ bits of working space, and counts the number of…
Given a set of pattern strings $\mathcal{P}=\{P_1, P_2,\ldots P_k\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed…
Given a string $S$ of length $n$, the classic string indexing problem is to preprocess $S$ into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is…
Given a pattern $P$ and a text $T$, both strings over a binary alphabet, the binary jumbled string matching problem consists in telling whether any permutation of $P$ occurs in $T$. The indexed version of this problem, i.e., preprocessing a…
The compressed indexing problem is to preprocess a string $S$ of length $n$ into a compressed representation that supports pattern matching queries. That is, given a string $P$ of length $m$ report all occurrences of $P$ in $S$. We present…
We show that the compressed suffix array and the compressed suffix tree of a string $T$ can be built in $O(n)$ deterministic time using $O(n\log\sigma)$ bits of space, where $n$ is the string length and $\sigma$ is the alphabet size.…
Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse…
The Binary Jumbled String Matching problem is defined as: Given a string $s$ over $\{a,b\}$ of length $n$ and a query $(x,y)$, with $x,y$ non-negative integers, decide whether $s$ has a substring $t$ with exactly $x$ $a$'s and $y$ $b$'s.…
Given strings $P$ and $Q$ the (exact) string matching problem is to find all positions of substrings in $Q$ matching $P$. The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time…
Suppose that we are given a string $s$ of length $n$ over an alphabet $\{0,1,\ldots,n^{O(1)}\}$ and $\delta$ is the string complexity of $s$, a known compression measure. We describe an index on $s$ with $O(\delta\log\frac{n}{\delta})$…
In a \emph{weighted sequence}, for every position of the sequence and every letter of the alphabet a probability of occurrence of this letter at this position is specified. Weighted sequences are commonly used to represent imprecise or…
In this paper we describe compressed indexes that support pattern matching queries for strings with wildcards. For a constant size alphabet our data structure uses $O(n\log^{\varepsilon}n)$ bits for any $\varepsilon>0$ and reports all…
Given a pattern string $P$ of length $n$ consisting of $\delta$ distinct characters and a query string $T$ of length $m$, where the characters of $P$ and $T$ are drawn from an alphabet $\Sigma$ of size $\Delta$, the {\em exact string…
We consider the problem of indexing a string $t$ of length $n$ to report the occurrences of a query pattern $p$ containing $m$ characters and $j$ wildcards. Let $occ$ be the number of occurrences of $p$ in $t$, and $\sigma$ the size of the…
We consider the problem of maintaining a collection of strings while efficiently supporting splits and concatenations on them, as well as comparing two substrings, and computing the longest common prefix between two suffixes. This problem…
We consider the $Parameterized$ $Pattern$ $Matching$ problem, where a pattern $P$ matches some location in a text $\mathsf{T}$ iff there is a one-to-one correspondence between the alphabet symbols of the pattern to those of the text. More…
We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size $N$ over alphabet $[1,\sigma]$ is composed of $D$ copies…
We consider string matching with variable length gaps. Given a string $T$ and a pattern $P$ consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending…