Related papers: Pivot Selection for Median String Problem
Strings are a natural representation of biological data such as DNA, RNA and protein sequences. The problem of finding a string that summarizes a set of sequences has direct application in relative compression algorithms for genome and…
We study approximation algorithms for variants of the \emph{median string} problem, which asks for a string that minimizes the sum of edit distances from a given set of $m$ strings of length $n$. Only the straightforward $2$-approximation…
Given a set of strings over a specified alphabet, identifying a median or consensus string that minimizes the total distance to all input strings is a fundamental data aggregation problem. When the Hamming distance is considered as the…
The edit distance (a.k.a. the Levenshtein distance) between two strings is defined as the minimum number of insertions, deletions or substitutions of symbols needed to transform one string into another. The problem of computing the edit…
We report (to our knowledge) the first evaluation of Constraint Satisfaction as a computational framework for solving closest string problems. We show that careful consideration of symbol occurrences can provide search heuristics that…
The problem of finding a center string that is `close' to every given string arises and has many applications in computational biology and coding theory. This problem has two versions: the Closest String problem and the Closest Substring…
Trace reconstruction considers the task of recovering an unknown string $x \in \{0,1\}^n$ given a number of independent "traces", i.e., subsequences of $x$ obtained by randomly and independently deleting every symbol of $x$ with some…
The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants…
We study approximation algorithms for the following three string measures that are widely used in practice: edit distance (ED), longest common subsequence (LCS), and longest increasing sequence (LIS). All three problems can be solved…
In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one…
The linear pivot selection algorithm, known as median-of-medians, makes the worst case complexity of quicksort be $\mathrm{O}(n\ln n)$. Nevertheless, it has often been said that this algorithm is too expensive to use in quicksort. In this…
We study the classical approximate string matching problem, that is, given strings $P$ and $Q$ and an error threshold $k$, find all ending positions of substrings of $Q$ whose edit distance to $P$ is at most $k$. Let $P$ and $Q$ have…
Consider the problem of finding a point in a metric space $(\{1,2,\ldots,n\},d)$ with the minimum average distance to other points. We show that this problem has no deterministic $o(n^{1+1/(h-1)})$-query $(2h-\Omega(1))$-approximation…
The Closest String Problem is an NP-hard problem that aims to find a string that has the minimum distance from all sequences that belong to the given set of strings. Its applications can be found in coding theory, computational biology, and…
We consider an \emph{approximate} version of the trace reconstruction problem, where the goal is to recover an unknown string $s\in\{0,1\}^n$ from $m$ traces (each trace is generated independently by passing $s$ through a probabilistic…
Classic similarity measures of strings are longest common subsequence and Levenshtein distance (i.e., the classic edit distance). A classic similarity measure of curves is dynamic time warping. These measures can be computed by simple…
String similarity, longest common subsequence and shortest edit scripts are the triplets of problem that related to each other. There are different algorithms exist to generate edit script by solving longest common subsequence problem. This…
We aim to speed up approximate keyword matching by storing a lightweight, fixed-size block of data for each string, called a fingerprint. These work in a similar way to hash values; however, they can be also used for matching with errors.…
The algorithmic tasks of computing the Hamming distance between a given pattern of length $m$ and each location in a text of length $n$ is one of the most fundamental algorithmic tasks in string algorithms. Unfortunately, there is evidence…
Most existing distance metric learning methods assume perfect side information that is usually given in pairwise or triplet constraints. Instead, in many real-world applications, the constraints are derived from side information, such as…