Related papers: Internal Dictionary Matching
We consider the problem of preprocessing a text $T$ of length $n$ and a dictionary $\mathcal{D}$ in order to be able to efficiently answer queries $CountDistinct(i,j)$, that is, given $i$ and $j$ return the number of patterns from…
We study the internal dictionary matching (IDM) problem where a dictionary $\mathcal{D}$ containing $d$ substrings of a text $T$ is given, and each query concerns the occurrences of patterns in $\mathcal{D}$ in another substring of $T$. We…
We consider several types of internal queries, that is, questions about fragments of a given text $T$ specified in constant space by their locations in $T$. Our main result is an optimal data structure for Internal Pattern Matching (IPM)…
We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size $N$ over alphabet $[1,\sigma]$ is composed of $D$ copies…
Internal pattern matching requires one to answer queries about factors of a given string. Many results are known on answering internal period queries, asking for the periods of a given factor. In this paper we investigate (for the first…
Let $\mathcal{D}$ be a collection of $D$ documents, which are strings over an alphabet of size $\sigma$, of total length $n$. We describe a data structure that uses linear space and and reports $k$ most relevant documents that contain a…
The string indexing problem is a fundamental computational problem with numerous applications, including information retrieval and bioinformatics. It aims to efficiently solve the pattern matching problem: given a text T of length n for…
The dictionary matching with gaps problem is to preprocess a dictionary $D$ of $d$ gapped patterns $P_1,\ldots,P_d$ over alphabet $\Sigma$, where each gapped pattern $P_i$ is a sequence of subpatterns separated by bounded sequences of don't…
We introduce a data structure for counting pattern occurrences in texts compressed with any run-length context-free grammar. Our structure uses space proportional to the grammar size and counts the occurrences of a pattern of length $m$ in…
Given a set of pattern strings $\mathcal{P}=\{P_1, P_2,\ldots P_k\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed…
In this work, we consider pattern matching variants in small space, that is, in the read-only setting, where we want to bound the space usage on top of storing the strings. Our main contribution is a space-time trade-off for the Internal…
The dictionary matching problem preprocesses a set of patterns and finds all occurrences of each of the patterns in a text when it is provided. We focus on the dynamic setting, in which patterns can be inserted to and removed from the…
We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the…
The dictionary matching is a task to find all occurrences of patterns in a set $D$ (called a dictionary) on a text $T$. The Aho-Corasick-automaton (AC-automaton) is a data structure which enables us to solve the dictionary matching problem…
The compressed indexing problem is to preprocess a string $S$ of length $n$ into a compressed representation that supports pattern matching queries. That is, given a string $P$ of length $m$ report all occurrences of $P$ in $S$. We present…
We revisit the fundamental problem of dictionary look-up with mismatches. Given a set (dictionary) of $d$ strings of length $m$ and an integer $k$, we must preprocess it into a data structure to answer the following queries: Given a query…
Let ${\cal{D}}$ = $\{d_1, d_2, d_3, ..., d_D\}$ be a given set of $D$ (string) documents of total length $n$. The top-$k$ document retrieval problem is to index $\cal{D}$ such that when a pattern $P$ of length $p$, and a parameter $k$ come…
We consider string matching with variable length gaps. Given a string $T$ and a pattern $P$ consisting of strings separated by variable length gaps (arbitrary strings of length in a specified range), the problem is to find all ending…
For every fixed $d \in \mathbb{N}$, we design a data structure that represents a binary $n \times n$ matrix that is $d$-twin-ordered. The data structure occupies $O_d(n)$ bits, which is the least one could hope for, and can be queried for…
We describe the first self-indexes able to count and locate pattern occurrences in optimal time within a space bounded by the size of the most popular dictionary compressors. To achieve this result we combine several recent findings,…