Related papers: Pairwise sequence alignment at arbitrarily large e…
A method based on mapping a symbolic sequence into a set of patterns (strings resulting from the sequence parsing) is proposed as a tool for the reconstruction of ancestral sequences. The set union of patterns comprises all the patterns…
The reconstruction of phylogenies from DNA or protein sequences is a major task of computational evolutionary biology. Common phenomena, notably variations in mutation rates across genomes and incongruences between gene lineage histories,…
Recent advances in high-throughput genomics technologies have resulted in the sequencing of large numbers of (near) complete genomes. These genome sequences are being mined for important functional elements, such as genes. They are also…
Various methods have been developed to analyze the association between organisms and their genomic sequences. Among them, sequence alignment is the most frequently used for comparative analysis of biological genomes. However, the…
Understanding the dynamics of genome rearrangements is a major issue of phylogenetics. Phylogenetics is the study of species evolution. A major goal of the field is to establish evolutionary relationships within groups of species, in order…
We present an efficient phylogenetic reconstruction algorithm allowing insertions and deletions which provably achieves a sequence-length requirement (or sample complexity) growing polynomially in the number of taxa. Our algorithm is…
The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over…
Multiple sequence alignment is a basic procedure in molecular biology, and it is often treated as being essentially a solved computational problem. However, this is not so, and here I review the evidence for this claim, and outline the…
Phylogenetic inference, the task of reconstructing how related sequences evolved from common ancestors, is a central objective in evolutionary genomics. The current state-of-the-art methods exploit probabilistic models of sequence evolution…
Bayesian inference is now a leading technique for reconstructing phylogenetic trees from aligned sequence data. In this short note, we formally show that the maximum posterior tree topology provides a statistically consistent estimate of a…
Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e. arranging sequences from different organisms in…
Dendrograms are a way to represent evolutionary relationships between organisms. Nowadays, these are inferred based on the comparison of genes or protein sequences by taking into account their differences and similarities. The genetic…
In evolutionary biology, the speciation history of living organisms is represented graphically by a phylogeny, that is, a rooted tree whose leaves correspond to current species and branchings indicate past speciation events. Phylogenies are…
Most of major algorithms for phylogenetic tree reconstruction assume that sequences in the analyzed set either do not have any offspring, or that parent sequences can maximally mutate into just two descendants. The graph resulting from such…
The alignment of biological sequences such as DNA, RNA, and proteins, is one of the basic tools that allow to detect evolutionary patterns, as well as functional/structural characterizations between homologous sequences in different…
The structure of a protein is crucial in determining its functionality, and is much more conserved than sequence during evolution. A key task in structural biology is to compare protein structures in order to determine evolutionary…
The ancestral sequence reconstruction problem is the inference, back in time, of the properties of common sequence ancestors from measured properties of contemporary populations. Standard algorithms for this problem assume independent…
Predicting protein-protein interactions from sequences is an important goal of computational biology. Various sources of information can be used to this end. Starting from the sequences of two interacting protein families, one can use…
The multispecies coalescent process models the genealogical relationships of genes sampled from several species, enabling useful predictions about phenomena such as the discordance between the gene tree and the species phylogeny due to…
Sequence comparison and alignment has had an enormous impact on our understanding of evolution, biology, and disease. Comparison and alignment of biological networks will likely have a similar impact. Existing network alignments use…