Related papers: Universal Features in the Genome-level Evolution o…
We present a combined mean-field and simulation approach to different models describing the dynamics of classes formed by elements that can appear, disappear or copy themselves. These models, related to a paradigm duplication-innovation…
Current-day genomes bear the mark of the evolutionary processes. One of the strongest indications is the sequence homology among families of proteins that perform similar biological functions in different species. The number of proteins in…
We show that simple stochastic models of genome evolution lead to power law asymptotics of protein domain family size distribution. These models, called Birth, Death and Innovation Models (BDIM), represent a special class of balanced…
Successive whole genome duplications have recently been firmly established in all major eukaryote kingdoms. It is not clear, however, how such dramatic evolutionary process has contributed to shape the large scale topology of…
Genomes evolve as modules. In prokaryotes (and some eukaryotes), genetic material can be transferred between species and integrated into the genome via homologous or illegitimate recombination. There is little reason to imagine that the…
Genomic duplication-divergence events, which are the primary source of new protein functions, occur stochastically at a wide range of genomic scales, from single gene to whole genome duplications. Clearly, this fundamental evolutionary…
The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic…
Protein distributions measured under a broad set of conditions in bacteria and yeast were shown to exhibit a common skewed shape, with variances depending quadratically on means. For bacteria these properties were reproduced by temporal…
Background: Duplication of genes is important for evolution of molecular networks. Many authors have therefore considered gene duplication as a driving force in shaping the topology of molecular networks. In particular it has been noted…
Generative artificial intelligence models learn probability distributions from data and produce novel samples that capture the salient properties of their training sets. Proteins are particularly attractive for such approaches given their…
We show that the protein-protein interaction networks can be surprisingly well described by a very simple evolution model of duplication and divergence. The model exhibits a remarkably rich behavior depending on a single parameter, the…
The next step in the understanding of the genome organization, after the determination of complete sequences, involves proteomics. The proteome includes the whole set of protein-protein interactions, and two recent independent studies have…
The protein folding problem has attracted an increasing attention from physicists. The problem has a flavor of statistical mechanics, but possesses the most common feature of most biological problems -- the profound effects of evolution. I…
Traditional domain generalization methods often rely on domain alignment to reduce inter-domain distribution differences and learn domain-invariant representations. However, domain shifts are inherently difficult to eliminate, which limits…
Among several quantitative invariants found in evolutionary genomics, one of the most striking is the scaling of the overall abundance of proteins, or protein domains, sharing a specific functional annotation across genomes of given size.…
In this work we propose a physical model of organismal evolution, where phenotype, organism life expectancy, is directly related to genotype i.e. the stability of its proteins which can be determined exactly in the model. Simulating the…
A Profile Mixture Model is a model of protein evolution, describing sequence data in which sites are assumed to follow many related substitution processes on a single evolutionary tree. The processes depend in part on different amino acid…
Large scale databases are available that contain homologous gene families constructed from hundreds of complete genome sequences from across the three domains of Life. Here we discuss approches of increasing complexity aimed at extracting…
Much evolutionary information is stored in the fluctuations of protein length distributions. The genome size and non-coding DNA content can be calculated based only on the protein length distributions. So there is intrinsic relationship…
Proteins, by virtue of their central role in most biological processes, represent one of the key subjects of the study of molecular evolution. Inherent to the indispensability of proteins for living cells is the fact that a given protein…