Related papers: Contextual Pattern Matching

Fast, Small, and Simple Document Listing on Repetitive Text Collections

Document listing on string collections is the task of finding all documents where a pattern appears. It is regarded as the most fundamental document retrieval problem, and is useful in various applications. Many of the fastest-growing…

Data Structures and Algorithms · Computer Science 2019-02-21 Dustin Cobas , Gonzalo Navarro

Contextual Pattern Mining and Counting

Given a string $P$ of length $m$, a longer string $T$ of length $n>m$, and two integers $l\geq 0$ and $r\geq 0$, the context of $P$ in $T$ is the set of all string pairs $(L,R)$, with $|L|=l$ and $|R|=r$, such that the string $LPR$ occurs…

Data Structures and Algorithms · Computer Science 2025-06-24 Ling Li , Daniel Gibney , Sharma V. Thankachan , Solon P. Pissis , Grigorios Loukides

Compressed Indexing for Consecutive Occurrences

The fundamental question considered in algorithms on strings is that of indexing, that is, preprocessing a given string for specific queries. By now we have a number of efficient solutions for this problem when the queries ask for an exact…

Data Structures and Algorithms · Computer Science 2023-04-04 Paweł Gawrychowski , Garance Gourdel , Tatiana Starikovskaya , Teresa Anna Steiner

Gapped Indexing for Consecutive Occurrences

The classic string indexing problem is to preprocess a string S into a compact data structure that supports efficient pattern matching queries. Typical queries include existential queries (decide if the pattern occurs in S), reporting…

Data Structures and Algorithms · Computer Science 2021-02-05 Philip Bille , Inge Li Gørtz , Max Rishøj Pedersen , Teresa Anna Steiner

An Efficient Data Structure and Algorithm for Long-Match Query in Run-Length Compressed BWT

In this paper, we describe a new type of match between a pattern and a text that aren't necessarily maximal in the query, but still contain useful matching information: locally maximal exact matches (LEMs). There are usually a large amount…

Data Structures and Algorithms · Computer Science 2025-05-22 Ahsan Sanaullah , Degui Zhi , Shaojie Zhang

Compressed Dictionary Matching on Run-Length Encoded Strings

Given a set of pattern strings $\mathcal{P}=\{P_1, P_2,\ldots P_k\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed…

Data Structures and Algorithms · Computer Science 2025-09-04 Philip Bille , Inge Li Gørtz , Simon J. Puglisi , Simon R. Tarnow

On Stabbing Queries for Generalized Longest Repeat

A longest repeat query on a string, motivated by its applications in many subfields including computational biology, asks for the longest repetitive substring(s) covering a particular string position (point query). In this paper, we extend…

Data Structures and Algorithms · Computer Science 2015-11-10 Bojian Xu

Document Retrieval on Repetitive String Collections

Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their…

Information Retrieval · Computer Science 2017-05-22 Travis Gagie , Aleksi Hartikainen , Kalle Karhu , Juha Kärkkäinen , Gonzalo Navarro , Simon J. Puglisi , Jouni Sirén

Efficient Index for Weighted Sequences

The problem of finding factors of a text string which are identical or similar to a given pattern string is a central problem in computer science. A generalised version of this problem consists in implementing an index over the text to…

Data Structures and Algorithms · Computer Science 2016-02-04 Carl Barton , Tomasz Kociumaka , Solon P. Pissis , Jakub Radoszewski

Internal Pattern Matching in Small Space and Applications

In this work, we consider pattern matching variants in small space, that is, in the read-only setting, where we want to bound the space usage on top of storing the strings. Our main contribution is a space-time trade-off for the Internal…

Data Structures and Algorithms · Computer Science 2024-04-29 Gabriel Bathie , Panagiotis Charalampopoulos , Tatiana Starikovskaya

Document Listing on Repetitive Collections with Guaranteed Performance

We consider document listing on string collections, that is, finding in which strings a given pattern appears. In particular, we focus on repetitive collections: a collection of size $N$ over alphabet $[1,\sigma]$ is composed of $D$ copies…

Data Structures and Algorithms · Computer Science 2018-11-15 Gonzalo Navarro

Data Structures for Range Sorted Consecutive Occurrence Queries

The string indexing problem is a fundamental computational problem with numerous applications, including information retrieval and bioinformatics. It aims to efficiently solve the pattern matching problem: given a text T of length n for…

Data Structures and Algorithms · Computer Science 2025-09-03 Waseem Akram , Takuya Mieno

Cross-Document Pattern Matching

We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern…

Data Structures and Algorithms · Computer Science 2012-06-21 Gregory Kucherov , Yakov Nekrich , Tatiana Starikovskaya

Optimal-Time Text Indexing in BWT-runs Bounded Space

Indexing highly repetitive texts --- such as genomic databases, software repositories and versioned text collections --- has become an important problem since the turn of the millennium. A relevant compressibility measure for repetitive…

Data Structures and Algorithms · Computer Science 2017-07-13 Travis Gagie , Gonzalo Navarro , Nicola Prezza

A Compact Index for Order-Preserving Pattern Matching

Order-preserving pattern matching was introduced recently but it has already attracted much attention. Given a reference sequence and a pattern, we want to locate all substrings of the reference sequence whose elements have the same…

Data Structures and Algorithms · Computer Science 2018-12-11 Gianni Decaroli , Travis Gagie , Giovanni Manzini

String Indexing for Top-$k$ Close Consecutive Occurrences

The classic string indexing problem is to preprocess a string $S$ into a compact data structure that supports efficient subsequent pattern matching queries, that is, given a pattern string $P$, report all occurrences of $P$ within $S$. In…

Data Structures and Algorithms · Computer Science 2024-02-15 Philip Bille , Inge Li Gørtz , Max Rishøj Pedersen , Eva Rotenberg , Teresa Anna Steiner

Indexing Weighted Sequences: Neat and Efficient

In a \emph{weighted sequence}, for every position of the sequence and every letter of the alphabet a probability of occurrence of this letter at this position is specified. Weighted sequences are commonly used to represent imprecise or…

Data Structures and Algorithms · Computer Science 2017-08-28 Carl Barton , Tomasz Kociumaka , Chang Liu , Solon P. Pissis , Jakub Radoszewski

Engineering Small Space Dictionary Matching

The dictionary matching problem is to locate occurrences of any pattern among a set of patterns in a given text. Massive data sets abound and at the same time, there are many settings in which working space is extremely limited. We…

Data Structures and Algorithms · Computer Science 2013-01-29 Shoshana Marcus Dina Sokol

Counting on General Run-Length Grammars

We introduce a data structure for counting pattern occurrences in texts compressed with any run-length context-free grammar. Our structure uses space proportional to the grammar size and counts the occurrences of a pattern of length $m$ in…

Data Structures and Algorithms · Computer Science 2025-01-30 Gonzalo Navarro , Alejandro Pacheco

Linear Approximate Pattern Matching Algorithm

Pattern matching is a fundamental process in almost every scientific domain. The problem involves finding the positions of a given pattern (usually of short length) in a reference stream of data (usually of large length). The matching can…

Data Structures and Algorithms · Computer Science 2022-07-01 Anas Al-okaily , Abdelghani Tbakhi