English
Related papers

Related papers: Simple, compact and robust approximate string dict…

200 papers

Given a string $\sigma$ over alphabet $\Sigma$ and a grammar $G$ defined over the same alphabet, how many minimum number of repairs: insertions, deletions and substitutions are required to map $\sigma$ into a valid member of $G$ ? We…

Data Structures and Algorithms · Computer Science 2013-11-13 Barna Saha

The approximate string matching is a fundamental and recurrent problem that arises in most computer science fields. This problem can be defined as follows: Let $D=\{x_1,x_2,\ldots x_d\}$ be a set of $d$ words defined on an alphabet…

Data Structures and Algorithms · Computer Science 2017-01-31 Ibrahim Chegrane

The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants…

Data Structures and Algorithms · Computer Science 2008-07-29 Dimitris Papamichail , Georgios Papamichail

The edit distance is a fundamental measure of sequence similarity, defined as the minimum number of character insertions, deletions, and substitutions needed to transform one string into the other. Given two strings of length at most $n$,…

Data Structures and Algorithms · Computer Science 2023-07-17 Tomasz Kociumaka , Anish Mukherjee , Barna Saha

Given a pair of strings, the problems of computing their Longest Common Subsequence and Edit Distance have been extensively studied for decades. For exact algorithms, LCS and Edit Distance (with character insertions and deletions) are…

Data Structures and Algorithms · Computer Science 2019-04-12 Aviad Rubinstein , Zhao Song

The Swap-Insert Correction distance from a string $S$ of length $n$ to another string $L$ of length $m\geq n$ on the alphabet $[1..d]$ is the minimum number of insertions, and swaps of pairs of adjacent symbols, converting $S$ into $L$.…

Data Structures and Algorithms · Computer Science 2015-06-30 Jérémy Barbay , Pablo Pérez-Lantero

We revisit the fundamental problem of dictionary look-up with mismatches. Given a set (dictionary) of $d$ strings of length $m$ and an integer $k$, we must preprocess it into a data structure to answer the following queries: Given a query…

Data Structures and Algorithms · Computer Science 2018-06-27 Paweł Gawrychowski , Gad M. Landau , Tatiana Starikovskaya

Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread database applications. The goal of the problem is to preprocess $n$ strings of length $d$, to quickly answer queries $q$ of…

Data Structures and Algorithms · Computer Science 2020-07-10 Samuel McCauley

Given a dynamic set $K$ of $k$ strings of total length $n$ whose characters are drawn from an alphabet of size $\sigma$, a keyword dictionary is a data structure built on $K$ that provides locate, prefix search, and update operations on…

Data Structures and Algorithms · Computer Science 2020-10-08 Kazuya Tsuruta , Dominik Köppl , Shunsuke Kanda , Yuto Nakashima , Shunsuke Inenaga , Hideo Bannai , Masayuki Takeda

The edit distance between strings classically assigns unit cost to every character insertion, deletion, and substitution, whereas the Hamming distance only allows substitutions. In many real-life scenarios, insertions and deletions…

Data Structures and Algorithms · Computer Science 2026-02-23 Elazar Goldenberg , Tomasz Kociumaka , Robert Krauthgamer , Barna Saha

The problem of storing a set of strings --- a string dictionary --- in compact form appears naturally in many cases. While classically it has represented a small part of the whole data to be processed (e.g., for Natural Language processing…

Data Structures and Algorithms · Computer Science 2011-01-31 Nieves R. Brisaboa , Rodrigo Cánovas , Miguel A. Martínez-Prieto , Gonzalo Navarro

Approximate string matching is the problem of finding all factors of a text t of length n that are at a distance at most k from a pattern x of length m. Approximate circular string matching is the problem of finding all factors of t that…

Data Structures and Algorithms · Computer Science 2016-04-26 Carl Barton , Costas S. Iliopoulos , Solon P. Pissis

Given a set of strings, the shortest common superstring problem is to find the shortest possible string that contains all the input strings. The problem is NP-hard, but a lot of work has gone into designing approximation algorithms for…

Data Structures and Algorithms · Computer Science 2019-12-04 Jarno Alanko , Tuukka Norri

In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one…

cmp-lg · Computer Science 2008-02-03 Eric Sven Ristad , Peter N. Yianilos

String consensus problems aim at finding a string that minimizes some given distance with respect to an input set of strings. In particular, in the Closest string problem, we are given a set of strings of equal length and a radius $d$. The…

Data Structures and Algorithms · Computer Science 2025-07-29 Estéban Gabory , Laurent Bulteau , Gabriele Fici , Hilde Verbeek

Motivated by the imminent growth of massive, highly redundant genomic databases, we study the problem of compressing a string database while simultaneously supporting fast random access, substring extraction and pattern matching to the…

Data Structures and Algorithms · Computer Science 2012-11-01 Travis Gagie , Paweł Gawrychowski , Christopher Hoobin , Simon J. Puglisi

We present an algorithm for approximating the edit distance $\operatorname{ed}(x, y)$ between two strings $x$ and $y$ in time parameterized by the degree to which one of the strings $x$ satisfies a natural pseudorandomness property. The…

Data Structures and Algorithms · Computer Science 2018-11-13 William Kuszmaul

We propose an algorithm for approximative dictionary lookup, where altered strings are matched against reference forms. The algorithm makes use of a divergence function between strings -- broadly belonging to the family of edit distances;…

Computation and Language · Computer Science 2021-09-03 Pascal Vaillant

We present an algorithm for approximating the edit distance between two strings of length $n$ in time $n^{1+\varepsilon}$ up to a constant factor, for any $\varepsilon>0$. Our result completes a research direction set forth in the recent…

Data Structures and Algorithms · Computer Science 2022-07-18 Alexandr Andoni , Negev Shekel Nosatzki

The edit distance $ed(X,Y)$ of two strings $X,Y\in \Sigma^*$ is the minimum number of character edits (insertions, deletions, and substitutions) needed to transform $X$ into $Y$. Its weighted counterpart $ed^w(X,Y)$ minimizes the total cost…

Data Structures and Algorithms · Computer Science 2025-07-04 Itai Boneh , Egor Gorbachev , Tomasz Kociumaka
‹ Prev 1 2 3 10 Next ›