Related papers: Document Counting in Practice
Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their…
Compared with constraint satisfaction problems, counting problems have received less attention. In this paper, we survey research works on the problems of counting the number of solutions to constraints. The constraints may take various…
The problem of storing a set of strings --- a string dictionary --- in compact form appears naturally in many cases. While classically it has represented a small part of the whole data to be processed (e.g., for Natural Language processing…
Document listing on string collections is the task of finding all documents where a pattern appears. It is regarded as the most fundamental document retrieval problem, and is useful in various applications. Many of the fastest-growing…
Document retrieval aims at finding the most important documents where a pattern appears in a collection of strings. Traditional pattern-matching techniques yield brute-force document retrieval solutions, which has motivated the research on…
The domains of data mining and knowledge discovery make use of large amounts of textual data, which need to be handled efficiently. Specific problems, like finding the maximum weight ordered common subset of a set of ordered sets or…
We introduce a data structure for counting pattern occurrences in texts compressed with any run-length context-free grammar. Our structure uses space proportional to the grammar size and counts the occurrences of a pattern of length $m$ in…
String constraint solving refers to solving combinatorial problems involving constraints over string variables. String solving approaches have become popular over the last years given the massive use of strings in different application…
We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern…
Discovering patterns from data is an important task in data mining. There exist techniques to find large collections of many kinds of patterns from data very efficiently. A collection of patterns can be regarded as a summary of the data. A…
Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through…
The string-matching field has grown at a such complicated stage that various issues come into play when studying it: data structure and algorithmic design, database principles, compression techniques, architectural features, cache and…
This paper tries to throw light in the usage of data structures in the field of information retrieval. Information retrieval is an area of study which is gaining momentum as the need and urge for sharing and exploring information is growing…
There are two methods for counting the number of occurrences of a string in another large string. One is to count the number of places where the string is found. The other is to determine how many pieces of string can be extracted without…
Text datasets can be represented using models that do not preserve text structure, or using models that preserve text structure. Our hypothesis is that depending on the dataset nature, there can be advantages using a model that preserves…
The task of Argument Mining, that is extracting and classifying argument components for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large Argument Mining…
A challenging case in web search and question answering are count queries, such as \textit{"number of songs by John Lennon"}. Prior methods merely answer these with a single, and sometimes puzzling number or return a ranked list of text…
The binary string matching problem consists in finding all the occurrences of a pattern in a text where both strings are built on a binary alphabet. This is an interesting problem in computer science, since binary data are omnipresent in…
We study the design of efficient algorithms for combinatorial pattern matching. More concretely, we study algorithms for tree matching, string matching, and string matching in compressed texts.
String diagrams are an increasingly popular algebraic language for the analysis of graphical models of computations across different research fields. Whereas string diagrams have been thoroughly studied as semantic structures, much less…