English

The most parsimonious tree for random data

Populations and Evolution 2014-06-03 v1

Abstract

Applying a method to reconstruct a phylogenetic tree from random data provides a way to detect whether that method has an inherent bias towards certain tree `shapes'. For maximum parsimony, applied to a sequence of random 2-state data, each possible binary phylogenetic tree has exactly the same distribution for its parsimony score. Despite this pleasing and slightly surprising symmetry, some binary phylogenetic trees are more likely than others to be a most parsimonious (MP) tree for a sequence of kk such characters, as we show. For k=2k=2, and unrooted binary trees on six taxa, any tree with a caterpillar shape has a higher chance of being an MP tree than any tree with a symmetric shape. On the other hand, if we take any two binary trees, on any number of taxa, we prove that this bias between the two trees vanishes as the number of characters grows. However, again there is a twist: MP trees on six taxa are more likely to have certain shapes than a uniform distribution on binary phylogenetic trees predicts, and this difference does not appear to dissipate as kk grows.

Keywords

Cite

@article{arxiv.1406.0217,
  title  = {The most parsimonious tree for random data},
  author = {Mareike Fischer and Michelle Galla and Lina Herbst and Mike Steel},
  journal= {arXiv preprint arXiv:1406.0217},
  year   = {2014}
}

Comments

19 pages, 8 figures

R2 v1 2026-06-22T04:27:57.620Z