A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

Jin Cao; Dewei Zhong

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

Data Structures and Algorithms 2020-09-09 v1 Artificial Intelligence Computational Complexity Machine Learning

Authors: Jin Cao , Dewei Zhong

Abstract

Finding the common subsequences of $L$ multiple strings has many applications in the area of bioinformatics, computational linguistics, and information retrieval. A well-known result states that finding a Longest Common Subsequence (LCS) for $L$ strings is NP-hard, e.g., the computational complexity is exponential in $L$ . In this paper, we develop a randomized algorithm, referred to as {\em Random-MCS}, for finding a random instance of Maximal Common Subsequence ( $MCS$ ) of multiple strings. A common subsequence is {\em maximal} if inserting any character into the subsequence no longer yields a common subsequence. A special case of MCS is LCS where the length is the longest. We show the complexity of our algorithm is linear in $L$ , and therefore is suitable for large $L$ . Furthermore, we study the occurrence probability for a single instance of MCS and demonstrate via both theoretical and experimental studies that the longest subsequence from multiple runs of {\em Random-MCS} often yields a solution to $LCS$ .

Keywords

string algorithms randomized algorithm sequence design

Cite

@article{arxiv.2009.03352,
  title  = {A Fast Randomized Algorithm for Finding the Maximal Common Subsequences},
  author = {Jin Cao and Dewei Zhong},
  journal= {arXiv preprint arXiv:2009.03352},
  year   = {2020}
}

Comments

9 pages

A Fast Randomized Algorithm for Finding the Maximal Common Subsequences

Abstract

Keywords

Cite

Comments

Related papers