Related papers: Simple, compact and robust approximate string dict…
Given a string $\sigma$ over alphabet $\Sigma$ and a grammar $G$ defined over the same alphabet, how many minimum number of repairs: insertions, deletions and substitutions are required to map $\sigma$ into a valid member of $G$ ? We…
The approximate string matching is a fundamental and recurrent problem that arises in most computer science fields. This problem can be defined as follows: Let $D=\{x_1,x_2,\ldots x_d\}$ be a set of $d$ words defined on an alphabet…
The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants…
The edit distance is a fundamental measure of sequence similarity, defined as the minimum number of character insertions, deletions, and substitutions needed to transform one string into the other. Given two strings of length at most $n$,…
Given a pair of strings, the problems of computing their Longest Common Subsequence and Edit Distance have been extensively studied for decades. For exact algorithms, LCS and Edit Distance (with character insertions and deletions) are…
The Swap-Insert Correction distance from a string $S$ of length $n$ to another string $L$ of length $m\geq n$ on the alphabet $[1..d]$ is the minimum number of insertions, and swaps of pairs of adjacent symbols, converting $S$ into $L$.…
We revisit the fundamental problem of dictionary look-up with mismatches. Given a set (dictionary) of $d$ strings of length $m$ and an integer $k$, we must preprocess it into a data structure to answer the following queries: Given a query…
Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread database applications. The goal of the problem is to preprocess $n$ strings of length $d$, to quickly answer queries $q$ of…
Given a dynamic set $K$ of $k$ strings of total length $n$ whose characters are drawn from an alphabet of size $\sigma$, a keyword dictionary is a data structure built on $K$ that provides locate, prefix search, and update operations on…
The edit distance between strings classically assigns unit cost to every character insertion, deletion, and substitution, whereas the Hamming distance only allows substitutions. In many real-life scenarios, insertions and deletions…
The problem of storing a set of strings --- a string dictionary --- in compact form appears naturally in many cases. While classically it has represented a small part of the whole data to be processed (e.g., for Natural Language processing…
Approximate string matching is the problem of finding all factors of a text t of length n that are at a distance at most k from a pattern x of length m. Approximate circular string matching is the problem of finding all factors of t that…
Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for…
In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one…
String consensus problems aim at finding a string that minimizes some given distance with respect to an input set of strings. In particular, in the Closest string problem, we are given a set of strings of equal length and a radius $d$. The…
Motivated by the imminent growth of massive, highly redundant genomic databases, we study the problem of compressing a string database while simultaneously supporting fast random access, substring extraction and pattern matching to the…
We present an algorithm for approximating the edit distance $\operatorname{ed}(x, y)$ between two strings $x$ and $y$ in time parameterized by the degree to which one of the strings $x$ satisfies a natural pseudorandomness property. The…
We propose an algorithm for approximative dictionary lookup, where altered strings are matched against reference forms. The algorithm makes use of a divergence function between strings -- broadly belonging to the family of edit distances;…
We present an algorithm for approximating the edit distance between two strings of length $n$ in time $n^{1+\varepsilon}$ up to a constant factor, for any $\varepsilon>0$. Our result completes a research direction set forth in the recent…
The edit distance $ed(X,Y)$ of two strings $X,Y\in \Sigma^*$ is the minimum number of character edits (insertions, deletions, and substitutions) needed to transform $X$ into $Y$. Its weighted counterpart $ed^w(X,Y)$ minimizes the total cost…