Related papers: Computing the Distribution of a Tree Metric
The Robinson-Foulds (RF) metric is arguably the most widely used measure of phylogenetic tree similarity, despite its well-known shortcomings: For example, moving a single taxon in a tree can result in a tree that has maximum distance to…
Motivated by applications in medical bioinformatics, Khayatian et al. (2024) introduced a family of metrics on Cayley trees (the $k$-RF distance, for $k=0, \ldots, n-2$) and explored their distribution on pairs of random Cayley trees via…
There are several tools available to infer phylogenetic trees, which depict the evolutionary relationships among biological entities such as viral and bacterial strains in infectious outbreaks, or cancerous cells in tumor progression trees.…
Understanding the mutational history of tumor cells is a critical endeavor in unraveling the mechanisms underlying cancer. Since the modeling of tumor cell evolution employs labeled trees, researchers are motivated to develop different…
The Hausdorff distance is a relatively new measure of similarity of graphs. The notion of the Hausdorff distance considers a special kind of a common subgraph of the compared graphs and depends on the structural properties outside of the…
We present a new method for inferring species trees from multi-copy gene trees. Our method is based on a generalization of the Robinson-Foulds (RF) distance to multi-labeled trees (mul-trees), i.e., gene trees in which multiple leaves can…
We consider the numerical taxonomy problem of fitting a positive distance function ${D:{S\choose 2}\rightarrow \mathbb R_{>0}}$ by a tree metric. We want a tree $T$ with positive edge weights and including $S$ among the vertices so that…
Random Forests (RF) is a popular machine learning method for classification and regression problems. It involves a bagging application to decision tree models. One of the primary advantages of the Random Forests model is the reduction in…
Rotation distance between rooted binary trees measures the number of simple operations it takes to transform one tree into another. There are no known polynomial-time algorithms for computing rotation distance. We give an efficient,…
This paper extends the study of fringe trees in random plane trees with a given degree statistic. While previous work established the asymptotic normality of the count of fringe trees isomorphic to a fixed tree, we investigate the case…
The Distributional Random Forest (DRF) is a recently introduced Random Forest algorithm to estimate multivariate conditional distributions. Due to its general estimation procedure, it can be employed to estimate a wide range of targets such…
The Poisson-binomial distribution is useful in many applied problems in engineering, actuarial science, and data mining. The Poisson-binomial distribution models the distribution of the sum of independent but not identically distributed…
We consider a probability measure on cycle-rooted spanning forests (CRSFs) introduced by Kenyon. CRSFs are spanning subgraphs, each connected component of which has a unique cycle; they generalize spanning trees. A generalization of…
We investigate the distribution of the depth of a node containing a specific key or, equivalently, the number of steps needed to retrieve an item stored in a randomly grown binary search tree. Using a representation in terms of mixed and…
We propose a tree-based semi-varying coefficient model for the Conway-Maxwell- Poisson (CMP or COM-Poisson) distribution which is a two-parameter generalization of the Poisson distribution and is flexible enough to capture both…
Random forest (RF) stands out as a highly favored machine learning approach for classification problems. The effectiveness of RF hinges on two key factors: the accuracy of individual trees and the diversity among them. In this study, we…
Maximum parsimony distance is a measure used to quantify the dissimilarity of two unrooted phylogenetic trees. It is NP-hard to compute, and very few positive algorithmic results are known due to its complex combinatorial structure. Here we…
In order to conduct a statistical analysis on a given set of phylogenetic gene trees, we often use a distance measure between two trees. In a statistical distance-based method to analyze discordance between gene trees, it is a key to decide…
We investigate the statistics of trees grown from some initial tree by attaching links to preexisting vertices, with attachment probabilities depending only on the valence of these vertices. We consider the asymptotic mass distribution that…
The recursive and hierarchical structure of full rooted trees is applicable to represent statistical models in various areas, such as data compression, image processing, and machine learning. In most of these cases, the full rooted tree is…