Related papers: Contextual Pattern Matching
Document listing on string collections is the task of finding all documents where a pattern appears. It is regarded as the most fundamental document retrieval problem, and is useful in various applications. Many of the fastest-growing…
Given a string $P$ of length $m$, a longer string $T$ of length $n>m$, and two integers $l\geq 0$ and $r\geq 0$, the context of $P$ in $T$ is the set of all string pairs $(L,R)$, with $|L|=l$ and $|R|=r$, such that the string $LPR$ occurs…
The fundamental question considered in algorithms on strings is that of indexing, that is, preprocessing a given string for specific queries. By now we have a number of efficient solutions for this problem when the queries ask for an exact…
The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S), reporting…
In this paper, we describe a new type of match between a pattern and a text that aren't necessarily maximal in the query, but still contain useful matching information: locally maximal exact matches (LEMs). There are usually a large amount…
Given a set of pattern strings $\mathcal{P}=\{P_1, P_2,\ldots P_k\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed…
A longest repeat query on a string, motivated by its applications in many subfields including computational biology, asks for the longest repetitive substring(s) covering a particular string position (point query). In this paper, we extend…
Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their…
The problem of finding factors of a text string which are identical or similar to a given pattern string is a central problem in computer science. A generalised version of this problem consists in implementing an index over the text to…
In this work, we consider pattern matching variants in small space, that is, in the read-only setting, where we want to bound the space usage on top of storing the strings. Our main contribution is a space-time trade-off for the Internal…
We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size $N$ over alphabet $[1,\sigma]$ is composed of $D$ copies…
The string indexing problem is a fundamental computational problem with numerous applications, including information retrieval and bioinformatics. It aims to efficiently solve the pattern matching problem: given a text T of length n for…
We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern…
Indexing highly repetitive texts --- such as genomic databases, software repositories and versioned text collections --- has become an important problem since the turn of the millennium. A relevant compressibility measure for repetitive…
Order-preserving pattern matching was introduced recently but it has already attracted much attention. Given a reference sequence and a pattern, we want to locate all substrings of the reference sequence whose elements have the same…
The classic string indexing problem is to preprocess a string $S$ into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string $P$, report all occurrences of $P$ within $S$. In…
In a \emph{weighted sequence}, for every position of the sequence and every letter of the alphabet a probability of occurrence of this letter at this position is specified. Weighted sequences are commonly used to represent imprecise or…
The dictionary matching problem is to locate occurrences of any pattern among a set of patterns in a given text. Massive data sets abound and at the same time, there are many settings in which working space is extremely limited. We…
We introduce a data structure for counting pattern occurrences in texts compressed with any run-length context-free grammar. Our structure uses space proportional to the grammar size and counts the occurrences of a pattern of length $m$ in…
Pattern matching is a fundamental process in almost every scientific domain. The problem involves finding the positions of a given pattern (usually of short length) in a reference stream of data (usually of large length). The matching can…