Related papers: Pattern Kits
Order-preserving pattern matching was introduced recently but it has already attracted much attention. Given a reference sequence and a pattern, we want to locate all substrings of the reference sequence whose elements have the same…
Data mining algorithms are now able to efficiently deal with huge amount of data. Various kinds of patterns may be discovered and may have some great impact on the general development of knowledge. In many domains, end users may want to…
In the dynamic indexing problem, we must maintain a changing collection of text documents so that we can efficiently support insertions, deletions, and pattern matching queries. We are especially interested in developing efficient data…
Inverted indexes allow to query large databases without needing to search in the database at each query. An important line of research is to construct the most efficient inverted indexes, both in terms of compression ratio and time…
The fundamental question considered in algorithms on strings is that of indexing, that is, preprocessing a given string for specific queries. By now we have a number of efficient solutions for this problem when the queries ask for an exact…
In this paper we present a new problem, the fast set intersection problem, which is to preprocess a collection of sets in order to efficiently report the intersection of any two sets in the collection. In addition we suggest new solutions…
We consider strategies to organize easily updatable associative arrays in external memory. These arrays are used for full-text search. We study indexes with different keys: single word form, two word forms, and sequences of word forms. The…
The dictionary matching problem is to locate occurrences of any pattern among a set of patterns in a given text. Massive data sets abound and at the same time, there are many settings in which working space is extremely limited. We…
Pattern spotting consists of searching in a collection of historical document images for occurrences of a graphical object using an image query. Contrary to object detection, no prior information nor predefined class is given about the…
Indexing is an effective way to support efficient query processing in large databases. Recently the concept of learned index, which replaces or complements traditional index structures with machine learning models, has been actively…
The dictionary matching problem preprocesses a set of patterns and finds all occurrences of each of the patterns in a text when it is provided. We focus on the dynamic setting, in which patterns can be inserted to and removed from the…
An extension to a recently introduced architecture of clique-based neural networks is presented. This extension makes it possible to store sequences with high efficiency. To obtain this property, network connections are provided with…
In today's data driven world, storing, processing, and gleaning insights from large-scale data are major challenges. Data compression is often required in order to store large amounts of high-dimensional data, and thus, efficient inference…
We consider how to index strings, trees and graphs for jumbled pattern matching when we are asked to return a match if one exists. For example, we show how, given a tree containing two colours, we can build a quadratic-space index with…
Patterns, which are collections of elements arranged in regular or near-regular arrangements, are an important graphic art form and widely used due to their elegant simplicity and aesthetic appeal. When a pattern is encoded as a flat image…
We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern…
Coded caching is a promising technique to smooth out network traffic by storing part of the library content at the users' local caches. The seminal work on coded caching for single file retrieval by Maddah-Ali and Niesen (MAN) showed the…
We introduce a dynamic data structure for the compact representation of binary relations $\mathcal{R} \subseteq A \times B$. The data structure is a dynamic variant of the k$^2$-tree, a static compact representation that takes advantage of…
We introduce a new technique for the efficient management of large sequences of multidimensional data, which takes advantage of regularities that arise in real-world datasets and supports different types of aggregation queries. More…
The research on indexing repetitive string collections has focused on the same search problems used for regular string collections, though they can make little sense in this scenario. For example, the basic pattern matching query "list all…