Related papers: Pivot Selection for Median String Problem

Assessing the best edit in perturbation-based iterative refinement algorithms to compute the median string

Strings are a natural representation of biological data such as DNA, RNA and protein sequences. The problem of finding a string that summarizes a set of sequences has direct application in relative compression algorithms for genome and…

Data Structures and Algorithms · Computer Science 2019-12-06 P. Mirabal , J. Abreu , D. Seco

Approximating the Median under the Ulam Metric

We study approximation algorithms for variants of the \emph{median string} problem, which asks for a string that minimizes the sum of edit distances from a given set of $m$ strings of length $n$. Only the straightforward $2$-approximation…

Data Structures and Algorithms · Computer Science 2020-11-03 Diptarka Chakraborty , Debarati Das , Robert Krauthgamer

Maximizing Diversity in (near-)Median String Selection

Given a set of strings over a specified alphabet, identifying a median or consensus string that minimizes the total distance to all input strings is a fundamental data aggregation problem. When the Hamming distance is considered as the…

Data Structures and Algorithms · Computer Science 2026-02-11 Diptarka Chakraborty , Rudrayan Kundu , Nidhi Purohit , Aravinda Kanchana Ruwanpathirana

Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false)

The edit distance (a.k.a. the Levenshtein distance) between two strings is defined as the minimum number of insertions, deletions or substitutions of symbols needed to transform one string into another. The problem of computing the edit…

Computational Complexity · Computer Science 2017-08-17 Arturs Backurs , Piotr Indyk

The Exact Closest String Problem as a Constraint Satisfaction Problem

We report (to our knowledge) the first evaluation of Constraint Satisfaction as a computational framework for solving closest string problems. We show that careful consideration of symbol occurrences can provide search heuristics that…

Artificial Intelligence · Computer Science 2010-05-04 Tom Kelsey , Lars Kotthoff

On The Closest String and Substring Problems

The problem of finding a center string that is `close' to every given string arises and has many applications in computational biology and coding theory. This problem has two versions: the Closest String problem and the Closest Substring…

Computational Engineering, Finance, and Science · Computer Science 2007-05-23 Ming Li , Bin Ma , Lusheng Wang

Limitations of Mean-Based Algorithms for Trace Reconstruction at Small Distance

Trace reconstruction considers the task of recovering an unknown string $x \in \{0,1\}^n$ given a number of independent "traces", i.e., subsequences of $x$ obtained by randomly and independently deleting every symbol of $x$ with some…

Probability · Mathematics 2022-03-16 Elena Grigorescu , Madhu Sudan , Minshen Zhu

Improved Algorithms for Approximate String Matching (Extended Abstract)

The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants…

Data Structures and Algorithms · Computer Science 2008-07-29 Dimitris Papamichail , Georgios Papamichail

Space Efficient Deterministic Approximation of String Measures

We study approximation algorithms for the following three string measures that are widely used in practice: edit distance (ED), longest common subsequence (LCS), and longest increasing sequence (LIS). All three problems can be solved…

Data Structures and Algorithms · Computer Science 2020-07-28 Kuan Cheng , Zhengzhong Jin , Xin Li , Yu Zheng

Learning string edit distance

In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one…

cmp-lg · Computer Science 2008-02-03 Eric Sven Ristad , Peter N. Yianilos

Quicksort with median of medians is considered practical

The linear pivot selection algorithm, known as median-of-medians, makes the worst case complexity of quicksort be $\mathrm{O}(n\ln n)$. Nevertheless, it has often been said that this algorithm is too expensive to use in quicksort. In this…

Data Structures and Algorithms · Computer Science 2016-08-18 Noriyuki Kurosawa

Faster Approximate String Matching for Short Patterns

We study the classical approximate string matching problem, that is, given strings $P$ and $Q$ and an error threshold $k$, find all ending positions of substrings of $Q$ whose edit distance to $P$ is at most $k$. Let $P$ and $Q$ have…

Data Structures and Algorithms · Computer Science 2011-03-21 Philip Bille

Metric $1$-median selection: Query complexity vs. approximation ratio

Consider the problem of finding a point in a metric space $(\{1,2,\ldots,n\},d)$ with the minimum average distance to other points. We show that this problem has no deterministic $o(n^{1+1/(h-1)})$-query $(2h-\Omega(1))$-approximation…

Computational Complexity · Computer Science 2015-09-21 Ching-Lueh Chang

A Three-Stage Algorithm for the Closest String Problem on Artificial and Real Gene Sequences

The Closest String Problem is an NP-hard problem that aims to find a string that has the minimum distance from all sequences that belong to the given set of strings. Its applications can be found in coding theory, computational biology, and…

Artificial Intelligence · Computer Science 2024-07-19 Alireza Abdi , Marko Djukanovic , Hesam Tahmasebi Boldaji , Hadis Salehi , Aleksandar Kartelj

Approximate Trace Reconstruction via Median String (in Average-Case)

We consider an \emph{approximate} version of the trace reconstruction problem, where the goal is to recover an unknown string $s\in\{0,1\}^n$ from $m$ traces (each trace is generated independently by passing $s$ through a probabilistic…

Data Structures and Algorithms · Computer Science 2021-07-21 Diptarka Chakraborty , Debarati Das , Robert Krauthgamer

Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping

Classic similarity measures of strings are longest common subsequence and Levenshtein distance (i.e., the classic edit distance). A classic similarity measure of curves is dynamic time warping. These measures can be computed by simple…

Computational Complexity · Computer Science 2015-04-06 Karl Bringmann , Marvin Künnemann

Algorithm to derive shortest edit script using Levenshtein distance algorithm

String similarity, longest common subsequence and shortest edit scripts are the triplets of problem that related to each other. There are different algorithms exist to generate edit script by solving longest common subsequence problem. This…

Data Structures and Algorithms · Computer Science 2022-08-19 P. Prakash Maria Liju

Lightweight Fingerprints for Fast Approximate Keyword Matching Using Bitwise Operations

We aim to speed up approximate keyword matching by storing a lightweight, fixed-size block of data for each string, called a fingerprint. These work in a similar way to hash values; however, they can be also used for matching with errors.…

Data Structures and Algorithms · Computer Science 2017-11-27 Aleksander Cisłak , Szymon Grabowski

Breaking the Variance: Approximating the Hamming Distance in $\tilde O(1/\epsilon)$ Time Per Alignment

The algorithmic tasks of computing the Hamming distance between a given pattern of length $m$ and each location in a text of length $n$ is one of the most fundamental algorithmic tasks in string algorithms. Unfortunately, there is evidence…

Data Structures and Algorithms · Computer Science 2015-12-15 Tsvi Kopelowitz , Ely Porat

Robust Metric Learning by Smooth Optimization

Most existing distance metric learning methods assume perfect side information that is usually given in pairwise or triplet constraints. Instead, in many real-world applications, the constraints are derived from side information, such as…

Machine Learning · Computer Science 2012-03-19 Kaizhu Huang , Rong Jin , Zenglin Xu , Cheng-Lin Liu