Related papers: Data compression and genomes: a two dimensional li…

Prediction of genomic properties and classification of life by protein length distributions

Much evolutionary information is stored in the fluctuations of protein length distributions. The genome size and non-coding DNA content can be calculated based only on the protein length distributions. So there is intrinsic relationship…

Genomics · Quantitative Biology 2008-06-03 Dirson Jian Li , Shengli Zhang

Genetic Sequence compression using Machine Learning and Arithmetic Encoding Decoding Techniques

We live in a period where bio-informatics is rapidly expanding, a significant quantity of genomic data has been produced as a result of the advancement of high-throughput genome sequencing technology, raising concerns about the costs…

Quantitative Methods · Quantitative Biology 2023-03-10 Mehedi Hasan Sarkar , Adnan Ferdous Ashrafi

Analysis of Compression Techniques for DNA Sequence Data

Biological data mainly comprises of Deoxyribonucleic acid (DNA) and protein sequences. These are the biomolecules which are present in all cells of human beings. Due to the self-replicating property of DNA, it is a key constitute of genetic…

Other Quantitative Biology · Quantitative Biology 2020-06-04 Shakeela Bibi , Javed Iqbal , Adnan Iftekhar , Mir Hassan

Information and order of genomic sequences within chromosomes as identified by complexity theory. An integrated methodology

Complexity metrics and machine learning (ML) models have been utilized to analyze the lengths of segmental genomic entities like: exons, introns, intergenic and repeat/unique DNA sequences, in each of the 22 human chromosomes. The purpose…

Biological Physics · Physics 2020-04-24 L. P. Karakatsanis , E. G. Pavlos , G. Tsoulouhas , G. L. Stamokostas , T. L. Mosbruger , J. L. Duke , G. P. Pavlos , D. S. Monos

Simplifying the mosaic description of DNA sequences

By using the Jensen-Shannon divergence, genomic DNA can be divided into compositionally distinct domains through a standard recursive segmentation procedure. Each domain, while significantly different from its neighbours, may however share…

Biological Physics · Physics 2009-11-07 Rajeev K. Azad , J. Subba Rao , Wentian Li , Ramakrishna Ramaswamy

Determining whether the non-protein-coding DNA sequences are in a complex interactive relationship by using an artificial intelligence method

Non protein coding regions of the human genome contain many complex patterns which regulate the cellular activity. Studying the human genome is limited by the lack of understanding of its features and their complex interactions. However,…

Genomics · Quantitative Biology 2017-08-15 Kerim Arioglu , Umut Eser

Sublinear Growth of Information in DNA Sequences

We introduce a novel method to analyse complete genomes and recognise some distinctive features by means of an adaptive compression algorithm, which is not DNA-oriented. We study the Information Content as a function of the number of…

Genomics · Quantitative Biology 2007-05-23 Giulia Menconi

Genome Compression Against a Reference

Being able to store and transmit human genome sequences is an important part in genomic research and industrial applications. The complete human genome has 3.1 billion base pairs (haploid), and storing the entire genome naively takes about…

Genomics · Quantitative Biology 2020-10-07 Anirduddha Laud , Gaurav Menghani , Madhava Keralapura

Symbolic Complexity for Nucleotide Sequences: A Sign of the Genome Structure

We introduce a method to estimate the complexity function of symbolic dynamical systems from a finite sequence of symbols. We test such complexity estimator on several symbolic dynamical systems whose complexity functions are known exactly.…

Populations and Evolution · Quantitative Biology 2017-01-19 R. Salgado-Garcia , E. Ugalde

A complexity measure for symbolic sequences and applications to DNA

We introduce a complexity measure for symbolic sequences. Starting from a segmentation procedure of the sequence, we define its complexity as the entropy of the distribution of lengths of the domains of relatively uniform composition in…

Classical Physics · Physics 2007-05-23 Ana P. Majtey , Ramon Roman-Roldan , Pedro W. Lamberti

GDC 2: Compression of large collections of genomes

The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid…

Data Structures and Algorithms · Computer Science 2017-03-03 Sebastian Deorowicz , Agnieszka Danek , Marcin Niemiec

Comparing Machine Learning Algorithms with or without Feature Extraction for DNA Classification

The classification of DNA sequences is a key research area in bioinformatics as it enables researchers to conduct genomic analysis and detect possible diseases. In this paper, three state-of-the-art algorithms, namely Convolutional Neural…

Other Quantitative Biology · Quantitative Biology 2020-11-03 Xiangxie Zhang , Ben Beinke , Berlian Al Kindhi , Marco Wiering

DNA Lossless Differential Compression Algorithm based on Similarity of Genomic Sequence Database

Modern biological science produces vast amounts of genomic sequence data. This is fuelling the need for efficient algorithms for sequence compression and analysis. Data compression and the associated techniques coming from information…

Data Structures and Algorithms · Computer Science 2011-09-05 Heba Afify , Muhammad Islam , Manal Abdel Wahed

Compaction of bacterial genomic DNA: Clarifying the concepts

The unconstrained genomic DNA of bacteria forms a coil, which volume exceeds 1000 times the volume of the cell. Since prokaryotes lack a membrane-bound nucleus, in sharp contrast with eukaryotes, the DNA may consequently be expected to…

Biomolecules · Quantitative Biology 2015-09-09 Marc Joyeux

Closing the complexity gap of the double distance problem

Genome rearrangement has been an active area of research in computational comparative genomics for the last three decades. While initially mostly an interesting algorithmic endeavor, now the practical application by applying rearrangement…

Computational Complexity · Computer Science 2025-07-23 Luís Cunha , Thiago Lopes , Uéverton Souza , Leonard Bohnenkämper , Marília D. V. Braga , Jens Stoye

Organisation and dynamics of individual DNA segments in topologically complex genomes

Capturing the physical organisation and dynamics of genomic regions is one of the major open challenges in biology. The kinetoplast DNA (kDNA) is a topologically complex genome, made by thousands of DNA (mini and maxi) circles interlinked…

Soft Condensed Matter · Physics 2025-04-16 Saminathan Ramakrishnan , Auro Varat Patnaik , Guglielmo Grillo , Luca Tubiana , Davide Michieletto

Information Analysis of DNA Sequences

The problem of differentiating the informational content of coding (exons) and non-coding (introns) regions of a DNA sequence is one of the central problems of genomics. The introns are estimated to be nearly 95% of the DNA and since they…

Computational Engineering, Finance, and Science · Computer Science 2010-10-21 Riyazuddin Mohammed

Dealing with complexity of biological systems: from data to models

Four chapters of the synthesis represent four major areas of my research interests: 1) data analysis in molecular biology, 2) mathematical modeling of biological networks, 3) genome evolution, and 4) cancer systems biology. The first…

Quantitative Methods · Quantitative Biology 2014-04-08 Andrei Zinovyev

A Compressed Self-Index for Genomic Databases

Advances in DNA sequencing technology will soon result in databases of thousands of genomes. Within a species, individuals' genomes are almost exact copies of each other; e.g., any two human genomes are 99.9% the same. Relative Lempel-Ziv…

Data Structures and Algorithms · Computer Science 2011-11-08 Travis Gagie , Juha Kärkkäinen , Yakov Nekrich , Simon J. Puglisi

A Fixed-Length Coding Algorithm for DNA Sequence Compression

While achieving a compression ratio of 2.0 bits/base, the new algorithm codes non-N bases in fixed length. It dramatically reduces the time of coding and decoding than previous DNA compression algorithms and some universal compression…

Information Theory · Computer Science 2007-07-16 Jie Liu , Sheng Bao , Zhiqiang Jing , Shi Chen