Related papers: The Statistical Dictionary-based String Matching P…

Approximate String Matching: Theory and Applications (La Recherche Approch\'ee de Motifs : Th\'eorie et Applications)

The approximate string matching is a fundamental and recurrent problem that arises in most computer science fields. This problem can be defined as follows: Let $D=\{x_1,x_2,\ldots x_d\}$ be a set of $d$ words defined on an alphabet…

Data Structures and Algorithms · Computer Science 2017-01-31 Ibrahim Chegrane

Dictionary matching in a stream

We consider the problem of dictionary matching in a stream. Given a set of strings, known as a dictionary, and a stream of characters arriving one at a time, the task is to report each time some string in our dictionary occurs in the…

Data Structures and Algorithms · Computer Science 2015-04-24 Raphael Clifford , Allyx Fontaine , Ely Porat , Benjamin Sach , Tatiana Starikovskaya

Document Retrieval on Repetitive String Collections

Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their…

Information Retrieval · Computer Science 2017-05-22 Travis Gagie , Aleksi Hartikainen , Kalle Karhu , Juha Kärkkäinen , Gonzalo Navarro , Simon J. Puglisi , Jouni Sirén

Efficient Online String Matching Based on Characters Distance Text Sampling

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. Sampled string…

Data Structures and Algorithms · Computer Science 2019-08-19 Simone Faro , Arianna Pavone , Francesco Pio Marino

Compressed Dictionary Matching on Run-Length Encoded Strings

Given a set of pattern strings $\mathcal{P}=\{P_1, P_2,\ldots P_k\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed…

Data Structures and Algorithms · Computer Science 2025-09-04 Philip Bille , Inge Li Gørtz , Simon J. Puglisi , Simon R. Tarnow

Top-k String Auto-Completion with Synonyms

Auto-completion is one of the most prominent features of modern information systems. The existing solutions of auto-completion provide the suggestions based on the beginning of the currently input character sequence (i.e. prefix). However,…

Information Retrieval · Computer Science 2016-11-24 Pengfei Xu , Jiaheng Lu

Pattern Masking for Dictionary Matching

In the Pattern Masking for Dictionary Matching (PMDM) problem, we are given a dictionary $\mathcal{D}$ of $d$ strings, each of length $\ell$, a query string $q$ of length $\ell$, and a positive integer $z$, and we are asked to compute a…

Data Structures and Algorithms · Computer Science 2024-03-11 Panagiotis Charalampopoulos , Huiping Chen , Peter Christen , Grigorios Loukides , Nadia Pisanti , Solon P. Pissis , Jakub Radoszewski

Optimal-Hash Exact String Matching Algorithms

String matching is the problem of finding all the occurrences of a pattern in a text. We propose improved versions of the fast family of string matching algorithms based on hashing $q$-grams. The improvement consists of considering minimal…

Data Structures and Algorithms · Computer Science 2023-03-13 Thierry Lecroq

Compressed String Dictionaries

The problem of storing a set of strings --- a string dictionary --- in compact form appears naturally in many cases. While classically it has represented a small part of the whole data to be processed (e.g., for Natural Language processing…

Data Structures and Algorithms · Computer Science 2011-01-31 Nieves R. Brisaboa , Rodrigo Cánovas , Miguel A. Martínez-Prieto , Gonzalo Navarro

Efficient Pattern Matching on Binary Strings

The binary string matching problem consists in finding all the occurrences of a pattern in a text where both strings are built on a binary alphabet. This is an interesting problem in computer science, since binary data are omnipresent in…

Data Structures and Algorithms · Computer Science 2008-10-15 Simone Faro , Thierry Lecroq

Speeding Up String Matching by Weak Factor Recognition

String matching is the problem of finding all the substrings of a text which match a given pattern. It is one of the most investigated problems in computer science, mainly due to its very diverse applications in several fields. Recently,…

Data Structures and Algorithms · Computer Science 2017-07-04 Domenico Cantone , Simone Faro , Arianna Pavone

Detecting Structural Irregularity in Electronic Dictionaries Using Language Modeling

Dictionaries are often developed using tools that save to Extensible Markup Language (XML)-based standards. These standards often allow high-level repeating elements to represent lexical entries, and utilize descendants of these repeating…

Computation and Language · Computer Science 2016-02-18 Paul Rodrigues , David Zajic , David Doermann , Michael Bloodgood , Peng Ye

Mining Statistically Significant Substrings Based on the Chi-Square Measure

Given the vast reservoirs of data stored worldwide, efficient mining of data from a large information store has emerged as a great challenge. Many databases like that of intrusion detection systems, web-click records, player statistics,…

Databases · Computer Science 2010-03-09 Sourav Dutta , Arnab Bhattacharya

The Exact String Matching Problem: a Comprehensive Experimental Evaluation

This paper addresses the online exact string matching problem which consists in finding all occurrences of a given pattern p in a text t. It is an extensively studied problem in computer science, mainly due to its direct applications to…

Data Structures and Algorithms · Computer Science 2010-12-14 Simone Faro , Thierry Lecroq

At the Roots of Dictionary Compression: String Attractors

A well-known fact in the field of lossless text compression is that high-order entropy is a weak model when the input contains long repetitions. Motivated by this, decades of research have generated myriads of so-called dictionary…

Data Structures and Algorithms · Computer Science 2020-12-17 Dominik Kempa , Nicola Prezza

Suffix Stripping Problem as an Optimization Problem

Stemming or suffix stripping, an important part of the modern Information Retrieval systems, is to find the root word (stem) out of a given cluster of words. Existing algorithms targeting this problem have been developed in a haphazard…

Information Retrieval · Computer Science 2013-12-25 B. P. Pande , Pawan Tamta , H. S. Dhami

Computing Matching Statistics on Repetitive Texts

Computing the {\em matching statistics} of a string $P[1..m]$ with respect to a text $T[1..n]$ is a fundamental problem which has application to genome sequence comparison. In this paper, we study the problem of computing the matching…

Data Structures and Algorithms · Computer Science 2022-01-14 Younan Gao

Probabilistic Threshold Indexing for Uncertain Strings

Strings form a fundamental data type in computer systems. String searching has been extensively studied since the inception of computer science. Increasingly many applications have to deal with imprecise strings or strings with fuzzy…

Databases · Computer Science 2015-09-30 Sharma V. Thankachan , Manish Patil , Rahul Shah , Sudip Biswas

Handling Massive N-Gram Datasets Efficiently

This paper deals with the two fundamental problems concerning the handling of large n-gram language models: indexing, that is compressing the n-gram strings and associated satellite data without compromising their retrieval speed; and…

Information Retrieval · Computer Science 2022-02-08 Giulio Ermanno Pibiri , Rossano Venturini

Engineering Small Space Dictionary Matching

The dictionary matching problem is to locate occurrences of any pattern among a set of patterns in a given text. Massive data sets abound and at the same time, there are many settings in which working space is extremely limited. We…

Data Structures and Algorithms · Computer Science 2013-01-29 Shoshana Marcus Dina Sokol