English
Related papers

Related papers: Document Counting in Practice

200 papers

Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their…

Information Retrieval · Computer Science 2017-05-22 Travis Gagie , Aleksi Hartikainen , Kalle Karhu , Juha Kärkkäinen , Gonzalo Navarro , Simon J. Puglisi , Jouni Sirén

Compared with constraint satisfaction problems, counting problems have received less attention. In this paper, we survey research works on the problems of counting the number of solutions to constraints. The constraints may take various…

Artificial Intelligence · Computer Science 2020-12-29 Jian Zhang , Cunjing Ge , Feifei Ma

The problem of storing a set of strings --- a string dictionary --- in compact form appears naturally in many cases. While classically it has represented a small part of the whole data to be processed (e.g., for Natural Language processing…

Data Structures and Algorithms · Computer Science 2011-01-31 Nieves R. Brisaboa , Rodrigo Cánovas , Miguel A. Martínez-Prieto , Gonzalo Navarro

Document listing on string collections is the task of finding all documents where a pattern appears. It is regarded as the most fundamental document retrieval problem, and is useful in various applications. Many of the fastest-growing…

Data Structures and Algorithms · Computer Science 2019-02-21 Dustin Cobas , Gonzalo Navarro

Document retrieval aims at finding the most important documents where a pattern appears in a collection of strings. Traditional pattern-matching techniques yield brute-force document retrieval solutions, which has motivated the research on…

Data Structures and Algorithms · Computer Science 2014-07-02 Gonzalo Navarro , Simon J. Puglisi , Jouni Sirén

The domains of data mining and knowledge discovery make use of large amounts of textual data, which need to be handled efficiently. Specific problems, like finding the maximum weight ordered common subset of a set of ordered sets or…

Data Structures and Algorithms · Computer Science 2009-12-07 Mugurel Ionut Andreica , Nicolae Tapus

We introduce a data structure for counting pattern occurrences in texts compressed with any run-length context-free grammar. Our structure uses space proportional to the grammar size and counts the occurrences of a pattern of length $m$ in…

Data Structures and Algorithms · Computer Science 2025-01-30 Gonzalo Navarro , Alejandro Pacheco

String constraint solving refers to solving combinatorial problems involving constraints over string variables. String solving approaches have become popular over the last years given the massive use of strings in different application…

Artificial Intelligence · Computer Science 2021-07-01 Roberto Amadini

We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern…

Data Structures and Algorithms · Computer Science 2012-06-21 Gregory Kucherov , Yakov Nekrich , Tatiana Starikovskaya

Discovering patterns from data is an important task in data mining. There exist techniques to find large collections of many kinds of patterns from data very efficiently. A collection of patterns can be regarded as a summary of the data. A…

Databases · Computer Science 2007-05-23 Taneli Mielikäinen

Two decades ago, a breakthrough in indexing string collections made it possible to represent them within their compressed space while at the same time offering indexed search functionalities. As this new technology permeated through…

Data Structures and Algorithms · Computer Science 2022-11-28 Gonzalo Navarro

The string-matching field has grown at a such complicated stage that various issues come into play when studying it: data structure and algorithmic design, database principles, compression techniques, architectural features, cache and…

Data Structures and Algorithms · Computer Science 2008-01-16 Paolo Ferragina

This paper tries to throw light in the usage of data structures in the field of information retrieval. Information retrieval is an area of study which is gaining momentum as the need and urge for sharing and exploring information is growing…

Information Retrieval · Computer Science 2016-02-26 V. R. Kanagavalli , G. Maheeja

There are two methods for counting the number of occurrences of a string in another large string. One is to count the number of places where the string is found. The other is to determine how many pieces of string can be extracted without…

Data Structures and Algorithms · Computer Science 2022-11-09 Ayaka Takamoto , Mitsuo Yoshida , Kyoji Umemura

Text datasets can be represented using models that do not preserve text structure, or using models that preserve text structure. Our hypothesis is that depending on the dataset nature, there can be advantages using a model that preserves…

Information Theory · Computer Science 2025-02-04 Ana Granados , Kostadin Koroutchev , Francisco de Borja Rodríguez

The task of Argument Mining, that is extracting and classifying argument components for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large Argument Mining…

Computation and Language · Computer Science 2024-10-08 Benjamin Schiller , Johannes Daxenberger , Andreas Waldis , Iryna Gurevych

A challenging case in web search and question answering are count queries, such as \textit{"number of songs by John Lennon"}. Prior methods merely answer these with a single, and sometimes puzzling number or return a ranked list of text…

Information Retrieval · Computer Science 2022-08-31 Shrestha Ghosh , Simon Razniewski , Gerhard Weikum

The binary string matching problem consists in finding all the occurrences of a pattern in a text where both strings are built on a binary alphabet. This is an interesting problem in computer science, since binary data are omnipresent in…

Data Structures and Algorithms · Computer Science 2008-10-15 Simone Faro , Thierry Lecroq

We study the design of efficient algorithms for combinatorial pattern matching. More concretely, we study algorithms for tree matching, string matching, and string matching in compressed texts.

Data Structures and Algorithms · Computer Science 2007-09-03 Philip Bille

String diagrams are an increasingly popular algebraic language for the analysis of graphical models of computations across different research fields. Whereas string diagrams have been thoroughly studied as semantic structures, much less…

Category Theory · Mathematics 2022-11-04 Paul Wilson , Fabio Zanasi
‹ Prev 1 2 3 10 Next ›