Related papers: Resilient Pattern Mining
In this work, we consider pattern matching variants in small space, that is, in the read-only setting, where we want to bound the space usage on top of storing the strings. Our main contribution is a space-time trade-off for the Internal…
Permutation patterns and pattern avoidance are central, well-studied concepts in combinatorics and computer science. Given two permutations $\tau$ and $\pi$, the pattern matching problem (PPM) asks whether $\tau$ contains $\pi$. This…
Frequent pattern mining is widely used to find ``important'' or ``interesting'' patterns in data. While it is not easy to mathematically define such patterns, maximal frequent patterns are promising candidates, as frequency is a natural…
Given a string $P$ of length $m$, a longer string $T$ of length $n>m$, and two integers $l\geq 0$ and $r\geq 0$, the context of $P$ in $T$ is the set of all string pairs $(L,R)$, with $|L|=l$ and $|R|=r$, such that the string $LPR$ occurs…
Sequential pattern mining (SPM) is an important branch of knowledge discovery that aims to mine frequent sub-sequences (patterns) in a sequential database. Various SPM methods have been investigated, and most of them are classical SPM…
Sequential pattern mining (SPM) is an important technique of pattern mining, which has many applications in reality. Although many efficient sequential pattern mining algorithms have been proposed, there are few studies can focus on target…
The most fundamental problem considered in algorithms for text processing is pattern matching: given a pattern $p$ of length $m$ and a text $t$ of length $n$, does $p$ occur in $t$? Multiple versions of this basic question have been…
Repeat finding in strings has important applications in subfields such as computational biology. Surprisingly, all prior work on repeat finding did not consider the constraint on the locality of repeats. In this paper, we propose and study…
Permutation Pattern Matching (or PPM) is a decision problem whose input is a pair of permutations $\pi$ and $\tau$, represented as sequences of integers, and the task is to determine whether $\tau$ contains a subsequence order-isomorphic to…
Nowadays, frequent pattern mining (FPM) on large graphs receives increasing attention, since it is crucial to a variety of applications, e.g., social analysis. Informally, the FPM problem is defined as finding all the patterns in a large…
In this paper, we consider the ``Shortest Superstring Problem''(SSP) or the ``Shortest Common Superstring Problem''(SCS). The problem is as follows. For a positive integer $n$, a sequence of n strings $S=(s^1,\dots,s^n)$ is given. We should…
The selection problem, where one wishes to locate the $k^{th}$ smallest element in an unsorted array of size $n$, is one of the basic problems studied in computer science. The main focus of this work is designing algorithms for solving the…
Time series are ubiquitous in domains ranging from medicine to marketing and finance. Frequent Pattern Mining (FPM) from a time series has thus received much attention. Recently, it has been studied under the order-preserving (OP) matching…
The string indexing problem is a fundamental computational problem with numerous applications, including information retrieval and bioinformatics. It aims to efficiently solve the pattern matching problem: given a text T of length n for…
We study algorithms for solving three problems on strings. The first one is the Most Frequently String Search Problem. The problem is the following. Assume that we have a sequence of $n$ strings of length $k$. The problem is finding the…
This paper proposes a frequent pattern data mining algorithm based on support vector machine (SVM), aiming to solve the performance bottleneck of traditional frequent pattern mining algorithms in high-dimensional and sparse data…
Given $m$ documents of total length $n$, we consider the problem of finding a longest string common to at least $d \geq 2$ of the documents. This problem is known as the \emph{longest common substring (LCS) problem} and has a classic $O(n)$…
This paper develops a memory-efficient approach for Sequential Pattern Mining (SPM), a fundamental topic in knowledge discovery that faces a well-known memory bottleneck for large data sets. Our methodology involves a novel hybrid trie data…
We consider the problem of querying a string (or, a database) of length $N$ bits to determine all the locations where a substring (query) of length $M$ appears either exactly or is within a Hamming distance of $K$ from the query. We assume…
With the growing popularity of shared resources, large volumes of complex data of different types are collected automatically. Traditional data mining algorithms generally have problems and challenges including huge memory cost, low…