Related papers: Faster Algorithm of String Comparison

Improved Algorithms for Approximate String Matching (Extended Abstract)

The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants…

Data Structures and Algorithms · Computer Science 2008-07-29 Dimitris Papamichail , Georgios Papamichail

Approximating Edit Distance in the Fully Dynamic Model

The edit distance is a fundamental measure of sequence similarity, defined as the minimum number of character insertions, deletions, and substitutions needed to transform one string into the other. Given two strings of length at most $n$,…

Data Structures and Algorithms · Computer Science 2023-07-17 Tomasz Kociumaka , Anish Mukherjee , Barna Saha

$\tilde{O}(n+\mathrm{poly}(k))$-time Algorithm for Bounded Tree Edit Distance

Computing the edit distance of two strings is one of the most basic problems in computer science and combinatorial optimization. Tree edit distance is a natural generalization of edit distance in which the task is to compute a measure of…

Data Structures and Algorithms · Computer Science 2022-09-16 Debarati Das , Jacob Gilbert , MohammadTaghi Hajiaghayi , Tomasz Kociumaka , Barna Saha , Hamed Saleh

Unified Compression-Based Acceleration of Edit-Distance Computation

The edit distance problem is a classical fundamental problem in computer science in general, and in combinatorial pattern matching in particular. The standard dynamic programming solution for this problem computes the edit-distance between…

Data Structures and Algorithms · Computer Science 2016-10-05 Danny Hermelin , Gad M. Landau , Shir Landau , Oren Weimann

Sublinear Algorithms for Gap Edit Distance

The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. A simple dynamic…

Computational Complexity · Computer Science 2019-10-03 Elazar Goldenberg , Robert Krauthgamer , Barna Saha

Streaming Algorithms For Computing Edit Distance Without Exploiting Suffix Trees

The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. In this paper we…

Data Structures and Algorithms · Computer Science 2016-07-14 Diptarka Chakraborty , Elazar Goldenberg , Michal Koucký

Reducing approximate Longest Common Subsequence to approximate Edit Distance

Given a pair of strings, the problems of computing their Longest Common Subsequence and Edit Distance have been extensively studied for decades. For exact algorithms, LCS and Edit Distance (with character insertions and deletions) are…

Data Structures and Algorithms · Computer Science 2019-04-12 Aviad Rubinstein , Zhao Song

On Practical Accuracy of Edit Distance Approximation Algorithms

The edit distance is a basic string similarity measure used in many applications such as text mining, signal processing, bioinformatics, and so on. However, the computational cost can be a problem when we repeat many distance calculations…

Data Structures and Algorithms · Computer Science 2017-01-24 Hiroyuki Hanada , Mineichi Kudo , Atsuyoshi Nakamura

Indexed Dynamic Programming to boost Edit Distance and LCSS Computation

There are efficient dynamic programming solutions to the computation of the Edit Distance from $S\in[1..\sigma]^n$ to $T\in[1..\sigma]^m$, for many natural subsets of edit operations, typically in time within $O(nm)$ in the worst-case over…

Information Retrieval · Computer Science 2018-06-13 Jérémy Barbay , Andrés Olivares

How Compression and Approximation Affect Efficiency in String Distance Measures

Real-world data often comes in compressed form. Analyzing compressed data directly (without decompressing it) can save space and time by orders of magnitude. In this work, we focus on fundamental sequence comparison problems and try to…

Data Structures and Algorithms · Computer Science 2021-12-14 Arun Ganesh , Tomasz Kociumaka , Andrea Lincoln , Barna Saha

Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread database applications. The goal of the problem is to preprocess $n$ strings of length $d$, to quickly answer queries $q$ of…

Data Structures and Algorithms · Computer Science 2020-07-10 Samuel McCauley

MinJoin: Efficient Edit Similarity Joins via Local Hash Minima

We study the problem of computing similarity joins under edit distance on a set of strings. Edit similarity joins is a fundamental problem in databases, data mining and bioinformatics. It finds important applications in data cleaning and…

Databases · Computer Science 2019-05-30 Haoyu Zhang , Qin Zhang

Does Preprocessing help in Fast Sequence Comparisons?

We study edit distance computation with preprocessing: the preprocessing algorithm acts on each string separately, and then the query algorithm takes as input the two preprocessed strings. This model is inspired by scenarios where we would…

Data Structures and Algorithms · Computer Science 2021-08-23 Elazar Goldenberg , Aviad Rubinstein , Barna Saha

$LCSk$++: Practical similarity metric for long strings

In this paper we present $LCSk$++: a new metric for measuring the similarity of long strings, and provide an algorithm for its efficient computation. With ever increasing size of strings occuring in practice, e.g. large genomes of plants…

Data Structures and Algorithms · Computer Science 2019-08-27 Filip Pavetić , Goran Žužić , Mile Šikić

Many Flavors of Edit Distance

Several measures exist for string similarity, including notable ones like the edit distance and the indel distance. The former measures the count of insertions, deletions, and substitutions required to transform one string into another,…

Data Structures and Algorithms · Computer Science 2024-10-15 Sudatta Bhattacharya , Sanjana Dey , Elazar Goldenberg , Michal Koucký

Average-Case Optimal Approximate Circular String Matching

Approximate string matching is the problem of finding all factors of a text t of length n that are at a distance at most k from a pattern x of length m. Approximate circular string matching is the problem of finding all factors of t that…

Data Structures and Algorithms · Computer Science 2016-04-26 Carl Barton , Costas S. Iliopoulos , Solon P. Pissis

Faster Algorithms for Bounded Tree Edit Distance

Tree edit distance is a well-studied measure of dissimilarity between rooted trees with node labels. It can be computed in $O(n^3)$ time [Demaine, Mozes, Rossman, and Weimann, ICALP 2007], and fine-grained hardness results suggest that the…

Data Structures and Algorithms · Computer Science 2021-06-11 Shyan Akmal , Ce Jin

Faster Sublinear-Time Edit Distance

We study the fundamental problem of approximating the edit distance of two strings. After an extensive line of research led to the development of a constant-factor approximation algorithm in almost-linear time, recent years have witnessed a…

Data Structures and Algorithms · Computer Science 2023-12-05 Karl Bringmann , Alejandro Cassis , Nick Fischer , Tomasz Kociumaka

Faster Approximate Pattern Matching: A Unified Approach

Approximate pattern matching is a natural and well-studied problem on strings: Given a text $T$, a pattern $P$, and a threshold $k$, find (the starting positions of) all substrings of $T$ that are at distance at most $k$ from $P$. We…

Data Structures and Algorithms · Computer Science 2020-11-17 Panagiotis Charalampopoulos , Tomasz Kociumaka , Philip Wellnitz

Weighted Edit Distance Computation: Strings, Trees and Dyck

Given two strings of length $n$ over alphabet $\Sigma$, and an upper bound $k$ on their edit distance, the algorithm of Myers (Algorithmica'86) and Landau and Vishkin (JCSS'88) computes the unweighted string edit distance in…

Data Structures and Algorithms · Computer Science 2023-02-09 Debarati Das , Jacob Gilbert , MohammadTaghi Hajiaghayi , Tomasz Kociumaka , Barna Saha