Related papers: On The Closest String and Substring Problems
We study the fundamental problem of finding the best string to represent a given set, in the form of the Closest String problem: Given a set $X \subseteq \Sigma^d$ of $n$ strings, find the string $x^*$ minimizing the radius of the smallest…
Finding an Approximate Longest Common Substring (ALCS) within a given set $S=\{s_1,s_2,\ldots,s_m\}$ of $m \ge 2$ strings is a key problem in computational biology, such as identifying related mutations across multiple genetic sequences. We…
The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants…
The Shortest Common Superstring problem (SCS) consists, for a set of strings S = {s_1,...,s_n}, in finding a minimum length string that contains all s_i, 1<= i <= n, as substrings. While a 2+11/30 approximation ratio algorithm has recently…
In this paper we consider the $p$-Norm Hamming Centroid problem which asks to determine whether some given binary strings have a centroid with a bound on the $p$-norm of its Hamming distances to the strings. Specifically, given a set of…
The approximate string matching is a fundamental and recurrent problem that arises in most computer science fields. This problem can be defined as follows: Let $D=\{x_1,x_2,\ldots x_d\}$ be a set of $d$ words defined on an alphabet…
We report (to our knowledge) the first evaluation of Constraint Satisfaction as a computational framework for solving closest string problems. We show that careful consideration of symbol occurrences can provide search heuristics that…
The Closest String Problem is an NP-hard problem that aims to find a string that has the minimum distance from all sequences that belong to the given set of strings. Its applications can be found in coding theory, computational biology, and…
The Shortest Common Superstring (SCS) problem is a fundamental task in sequence analysis. In genome assembly, however, the double-stranded nature of DNA implies that each fragment may occur either in its original orientation or as its…
The problem of finding longest common subsequence (LCS) is one of the fundamental problems in computer science, which finds application in fields such as computational biology, text processing, information retrieval, data compression etc.…
The closest string problem is an NP-hard problem, whose task is to find a string that minimizes maximum Hamming distance to a given set of strings. This can be reduced to an integer program (IP). However, to date, there exists no known…
This study investigates whether reoptimization can help in solving the closest substring problem. We are dealing with the following reoptimization scenario. Suppose, we have an optimal l-length closest substring of a given set of sequences…
Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem…
In the Shortest Superstring problem, we are given a set of strings and we are asking for a common superstring, which has the minimum number of characters. The Shortest Superstring problem is NP-hard and several constant-factor approximation…
String consensus problems aim at finding a string that minimizes some given distance with respect to an input set of strings. In particular, in the Closest string problem, we are given a set of strings of equal length and a radius $d$. The…
This paper investigates the approximability of the Longest Common Subsequence (LCS) problem. The fastest algorithm for solving the LCS problem exactly runs in essentially quadratic time in the length of the input, and it is known that under…
The {\em shortest common superstring} and the {\em shortest common supersequence} are two well studied problems having a wide range of applications. In this paper we consider both problems with resource constraints, denoted as the…
The Longest Common Subsequence (LCS) is a fundamental string similarity measure, and computing the LCS of two strings is a classic algorithms question. A textbook dynamic programming algorithm gives an exact algorithm in quadratic time, and…
The Longest Common Subsequence (LCS) of two strings is a fundamental string similarity measure with a classical dynamic programming solution taking quadratic time. Despite significant efforts, little progress was made in improving the…
In the Maximum Duo-Preservation String Mapping problem we are given two strings and wish to map the letters of the former to the letters of the latter so as to maximise the number of duos. A duo is a pair of consecutive letters that is…