Related papers: A limiting rule for the variability of coding sequ…
The length of coding sequence series in microbial genomes were regarded as a fluctuating system and characterized by the methods of statistical physics. The distribution and the correlatin properties of 50 genomes including bacteria and…
Statistical analysis of distributions of occurrence frequencies of short words in 108 microbial complete genomes reveals the existence of a set of universal "root-sequence lengths" shared by all microbial genomes. These lengths and their…
We show that textual analysis of microbial genomes reveal telling footprints of the early evolution of the genomes. The frequencies of word occurrence of random DNA sequences considered as texts in their four nucleotides are expected to…
With the number of fully-sequenced genomes now well over a hundred it has become possible to start investigating if there are any quantitative regularities in the genetic make-up of genomes. In (physics/0307001), I originally showed that…
In special coordinates (codon position--specific nucleotide frequencies) bacterial genomes form two straight lines in 9-dimensional space: one line for eubacterial genomes, another for archaeal genomes. All the 348 distinct bacterial…
Background. The large-scale pattern of distribution of genes on the chromosomes in the known animal genomes is not well characterized. We hypothesized that individual genes will be distributed on chromosomes in a mathematically ordered…
With the number of sequenced genomes now over one hundred, and the availability of rough functional annotations for a substantial proportion of their genes, it has become possible to study the statistics of gene content across genomes. Here…
The coding and noncoding length sequences constructed from a complete genome are characterised by multifractal analysis. The dimension spectrum $D_{q}$ and its derivative, the 'analogous' specific heat $C_{q}$, are calculated for the coding…
Textual analysis of typical microbial genomes reveals that they have the statistical characteristics of a DNA sequence of a much shorter length. This peculiar property supports an evolutionary model in which a genome evolves by random…
In the human genomes, recombination frequency between homologous chromosomes during meiosis is highly correlated with their physical length while it differs significantly when their coding density is considered. Furthermore, it has been…
This paper considers three kinds of length sequences of the complete genome. Detrended fluctuation analysis, spectral analysis, and the mean distance spanned within time $L$ are used to discuss the correlation property of these sequences.…
By creating networks of biochemical pathways, communities of micro-organisms are able to modulate the properties of their environment and even the metabolic processes within their hosts. Next-generation high-throughput sequencing has led to…
We calculate the mutual information function for each of the 24 chromosomes in the human genome. The same correlation pattern is observed regardless the individual functional features of each chromosome. Moreover, correlations of different…
The diversity revealed by large scale genomics in microbiology is calling into question long held beliefs about genome stability, evolutionary rate, even the definition of a species. MacArthur and Wilson's theory of insular biogeography…
The phenotype of any organism on earth is, in large part, the consequence of interplay between numerous gene products encoded in the genome, and such interplay between gene products affects the evolutionary fate of the genome itself through…
In condensed matter physics, simplified descriptions are obtained by coarse-graining the features of a system at a certain characteristic length, defined as the typical length beyond which some properties are no longer correlated. From a…
Genome length varies widely among organisms, from compact genomes of prokaryotes to vast and complex genomes of eukaryotes. In this study, we theoretically identify the evolutionary pressures that may have driven this divergence in genome…
An approach for approximately calculating the number of genes in a genome is presented, which takes into account the average protein length expected for the species. A number of virus, bacterial and eukaryotic genomes are scrutinized.…
Unraveling the evolutionary forces shaping bacterial diversity can today be tackled using a growing amount of genomic data. While the genome of eukaryotes is highly stable, bacterial genomes from cells of the same species highly vary in…
The main statistical distributions applicable to the analysis of genome architecture and genome tracks are briefly discussed and critically assessed. Although the observed features in distributions of element lengths can be equally well…