Related papers: Hidden Words Statistics for Large Patterns
We present a new algorithm for subsequence matching in grammar compressed strings. Given a grammar of size $n$ compressing a string of size $N$ and a pattern string of size $m$ over an alphabet of size $\sigma$, our algorithm uses…
The $d$-dimensional pattern matching problem is to find an occurrence of a pattern of length $m \times \dots \times m$ within a text of length $n \times \dots \times n$, with $n \ge m$. This task models various problems in text and image…
Given a random text over a finite alphabet, we study the frequencies at which fixed-length words occur as subsequences. As the data size grows, the joint distribution of word counts exhibits a rich asymptotic structure. We investigate all…
A weighted string over an alphabet of size $\sigma$ is a string in which a set of letters may occur at each position with respective occurrence probabilities. Weighted strings, also known as position weight matrices or uncertain sequences,…
The classical pattern matching asks for locating all occurrences of one string, called the pattern, in another, called the text, where a string is simply a sequence of characters. Due to the potential practical applications, it is desirable…
The equidistant subsequence pattern matching problem is considered. Given a pattern string $P$ and a text string $T$, we say that $P$ is an \emph{equidistant subsequence} of $T$ if $P$ is a subsequence of the text such that consecutive…
Let $S$ be a string of length $n$ over an alphabet $\Sigma$ and let $Q$ be a subset of $\Sigma$ of size $q \geq 2$. The 'co-occurrence problem' is to construct a compact data structure that supports the following query: given an integer $w$…
Given a pattern x of length m and a text y of length n, both over an ordered alphabet, the order-preserving pattern matching problem consists in finding all substrings of the text with the same relative order as the pattern. It is an…
The most fundamental problem considered in algorithms for text processing is pattern matching: given a pattern $p$ of length $m$ and a text $t$ of length $n$, does $p$ occur in $t$? Multiple versions of this basic question have been…
Given a set of pattern strings $\mathcal{P}=\{P_1, P_2,\ldots P_k\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed…
Let $w$ be a finite word over the alphabet $\{0,1\}$. For any natural number $n$, let $s_w(n)$ denote the number of occurrence of $w$ in the binary expansion of $n$ as a scattered subsequence. We study the behavior of the partial sum…
This paper gives a method for computing distributions associated with patterns in the state sequence of a hidden Markov model, conditional on observing all or part of the observation sequence. Probabilities are computed for very general…
The compressed indexing problem is to preprocess a string $S$ of length $n$ into a compressed representation that supports pattern matching queries. That is, given a string $P$ of length $m$ report all occurrences of $P$ in $S$. We present…
The classic exact pattern matching problem, given two strings -- a pattern $P$ of length $m$ and a text $T$ of length $n$ -- asks whether $P$ occurs as a substring of $T$. A property tester for the problem needs to distinguish (with high…
Pattern matching is a fundamental process in almost every scientific domain. The problem involves finding the positions of a given pattern (usually of short length) in a reference stream of data (usually of large length). The matching can…
The algorithmic tasks of computing the Hamming distance between a given pattern of length $m$ and each location in a text of length $n$ is one of the most fundamental algorithmic tasks in string algorithms. Unfortunately, there is evidence…
We present new algorithms for the problem of multiple string matching of gapped patterns, where a gapped pattern is a sequence of strings such that there is a gap of fixed length between each two consecutive strings. The problem has…
In this paper, we consider the problem of identifying patterns of interest in colored strings. A colored string is a string where each position is assigned one of a finite set of colors. Our task is to find substrings of the colored string…
Universal compression of patterns of sequences generated by independently identically distributed (i.i.d.) sources with unknown, possibly large, alphabets is investigated. A pattern is a sequence of indices that contains all consecutive…
We consider a class of pattern matching problems where a normalising transformation is applied at every alignment. Normalised pattern matching plays a key role in fields as diverse as image processing and musical information processing…