MSARC: Multiple Sequence Alignment by Residue Clustering
Abstract
Progressive methods offer efficient and reasonably good solutions to the multiple sequence alignment problem. However, resulting alignments are biased by guide-trees, especially for relatively distant sequences. We propose MSARC, a new graph-clustering based algorithm that aligns sequence sets without guide-trees. Experiments on the BAliBASE dataset show that MSARC achieves alignment quality similar to best progressive methods and substantially higher than the quality of other non-progressive algorithms. Furthermore, MSARC outperforms all other methods on sequence sets with the similarity structure hardly represented by a phylogenetic tree. Furthermore, MSARC outperforms all other methods on sequence sets whose evolutionary distances are hardly representable by a phylogenetic tree. These datasets are most exposed to the guide-tree bias of alignments. MSARC is available at http://bioputer.mimuw.edu.pl/msarc
Cite
@article{arxiv.1307.7844,
title = {MSARC: Multiple Sequence Alignment by Residue Clustering},
author = {Michał Modzelewski and Norbert Dojer},
journal= {arXiv preprint arXiv:1307.7844},
year = {2013}
}
Comments
Peer-reviewed and presented as part of the 13th Workshop on Algorithms in Bioinformatics (WABI2013)