Related papers: Improved Circular Dictionary Matching
Given a set of pattern strings $\mathcal{P}=\{P_1, P_2,\ldots P_k\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed…
The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size sigma, build a data structure so that we can match in…
Given $d$ strings over the alphabet $\{0,1,\ldots,\sigma{-}1\}$, the classical Aho--Corasick data structure allows us to find all $occ$ occurrences of the strings in any text $T$ in $O(|T| + occ)$ time using $O(m\log m)$ bits of space,…
We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size $n$ compressing a string of size $N$ and a pattern string of size $m$ over an alphabet of size $\sigma$, our algorithm uses…
Approximate string matching is the problem of finding all factors of a text t of length n that are at a distance at most k from a pattern x of length m. Approximate circular string matching is the problem of finding all factors of t that…
We revisit the fundamental problem of dictionary look-up with mismatches. Given a set (dictionary) of $d$ strings of length $m$ and an integer $k$, we must preprocess it into a data structure to answer the following queries: Given a query…
Pattern matching is a fundamental process in almost every scientific domain. The problem involves finding the positions of a given pattern (usually of short length) in a reference stream of data (usually of large length). The matching can…
The dictionary matching with gaps problem is to preprocess a dictionary $D$ of $d$ gapped patterns $P_1,\ldots,P_d$ over alphabet $\Sigma$, where each gapped pattern $P_i$ is a sequence of subpatterns separated by bounded sequences of don't…
The most fundamental problem considered in algorithms for text processing is pattern matching: given a pattern $p$ of length $m$ and a text $t$ of length $n$, does $p$ occur in $t$? Multiple versions of this basic question have been…
Two recent lower bounds on the compressibility of repetitive sequences, $\delta \le \gamma$, have received much attention. It has been shown that a length-$n$ string $S$ over an alphabet of size $\sigma$ can be represented within the…
The compressed indexing problem is to preprocess a string $S$ of length $n$ into a compressed representation that supports pattern matching queries. That is, given a string $P$ of length $m$ report all occurrences of $P$ in $S$. We present…
In this paper we describe compressed indexes that support pattern matching queries for strings with wildcards. For a constant size alphabet our data structure uses $O(n\log^{\varepsilon}n)$ bits for any $\varepsilon>0$ and reports all…
We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the…
The dictionary matching problem is to locate occurrences of any pattern among a set of patterns in a given text. Massive data sets abound and at the same time, there are many settings in which working space is extremely limited. We…
The $k$-mismatch problem consists in computing the Hamming distance between a pattern $P$ of length $m$ and every length-$m$ substring of a text $T$ of length $n$, if this distance is no more than $k$. In many real-world applications, any…
We describe the first self-indexes able to count and locate pattern occurrences in optimal time within a space bounded by the size of the most popular dictionary compressors. To achieve this result we combine several recent findings,…
In this paper we are interested in indexing texts for substring matching queries with one edit error. That is, given a text $T$ of $n$ characters over an alphabet of size $\sigma$, we are asked to build a data structure that answers the…
In this paper we revisit the classical regular expression matching problem, namely, given a regular expression $R$ and a string $Q$, decide if $Q$ matches one of the strings specified by $R$. Let $m$ and $n$ be the length of $R$ and $Q$,…
Packing several characters into one computer word is a simple and natural way to compress the representation of a string and to speed up its processing. Exploiting this idea, we propose an index for a packed string, based on a {\em sparse…
The dictionary matching is a task to find all occurrences of patterns in a set $D$ (called a dictionary) on a text $T$. The Aho-Corasick-automaton (AC-automaton) is a data structure which enables us to solve the dictionary matching problem…