Related papers: Genetic Sequence compression using Machine Learnin…

GDC 2: Compression of large collections of genomes

The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid…

Data Structures and Algorithms · Computer Science 2017-03-03 Sebastian Deorowicz , Agnieszka Danek , Marcin Niemiec

Analysis of Compression Techniques for DNA Sequence Data

Biological data mainly comprises of Deoxyribonucleic acid (DNA) and protein sequences. These are the biomolecules which are present in all cells of human beings. Due to the self-replicating property of DNA, it is a key constitute of genetic…

Other Quantitative Biology · Quantitative Biology 2020-06-04 Shakeela Bibi , Javed Iqbal , Adnan Iftekhar , Mir Hassan

An Efficient Biological Sequence Compression Technique Using LUT And Repeat In The Sequence

Data compression plays an important role to deal with high volumes of DNA sequences in the field of Bioinformatics. Again data compression techniques directly affect the alignment of DNA sequences. So the time needed to decompress a…

Computational Engineering, Finance, and Science · Computer Science 2012-11-13 Subhankar Roy , Sunirmal Khatua , Sudipta Roy , Samir K. Bandyopadhyay

Efficient Constraining of Transcoding in DNA-Based Image Storage

DNA has emerged as a promising alternative for long-term data storage due to its high capacity, durability, and low-energy potential. However, storing data in DNA presents several challenges. First, it requires complex and costly…

Other Quantitative Biology · Quantitative Biology 2025-11-20 Sara Al Sayyed , Aline Roumy , Thomas Maugey

DNA data storage, sequencing data-carrying DNA

DNA is a leading candidate as the next archival storage media due to its density, durability and sustainability. To read (and write) data DNA storage exploits technology that has been developed over decades to sequence naturally occurring…

Emerging Technologies · Computer Science 2022-05-12 Jasmine Quah , Omer Sella , Thomas Heinis

Genbit Compress Tool(GBC): A Java-Based Tool to Compress DNA Sequences and Compute Compression Ratio(bits/base) of Genomes

We present a Compression Tool, "GenBit Compress", for genetic sequences based on our new proposed "GenBit Compress Algorithm". Our Tool achieves the best compression ratios for Entire Genome (DNA sequences) . Significantly better…

Mathematical Software · Computer Science 2010-07-15 P. Raja Rajeswari , Allam Apparo , V. K. Kumar

Reference Based Genome Compression

DNA sequencing technology has advanced to a point where storage is becoming the central bottleneck in the acquisition and mining of more data. Large amounts of data are vital for genomics research, and generic compression tools, while…

Information Theory · Computer Science 2016-11-15 Bobbie Chern , Idoia Ochoa , Alexandros Manolakos , Albert No , Kartik Venkat , Tsachy Weissman

GeneFormer: Learned Gene Compression using Transformer-based Context Modeling

With the development of gene sequencing technology, an explosive growth of gene data has been witnessed. And the storage of gene data has become an important issue. Traditional gene data compression methods rely on general software like…

Machine Learning · Computer Science 2023-02-01 Zhanbei Cui , Yu Liao , Tongda Xu , Yan Wang

Image Storage on Synthetic DNA Using Autoencoders

Over the past years, the ever-growing trend on data storage demand, more specifically for "cold" data (rarely accessed data), has motivated research for alternative systems of data storage. Because of its biochemical characteristics,…

Machine Learning · Computer Science 2022-03-21 Xavier Pic , Marc Antonini

Genomic Compression with Read Alignment at the Decoder

We propose a new compression scheme for genomic data given as sequence fragments called reads. The scheme uses a reference genome at the decoder side only, freeing the encoder from the burdens of storing references and performing…

Information Theory · Computer Science 2023-02-10 Yotam Gershon , Yuval Cassuto

Implicit Neural Multiple Description for DNA-based data storage

DNA exhibits remarkable potential as a data storage solution due to its impressive storage density and long-term stability, stemming from its inherent biomolecular structure. However, developing this novel medium comes with its own set of…

Image and Video Processing · Electrical Eng. & Systems 2023-09-14 Trung Hieu Le , Xavier Pic , Jeremy Mateos , Marc Antonini

DNA Lossless Differential Compression Algorithm based on Similarity of Genomic Sequence Database

Modern biological science produces vast amounts of genomic sequence data. This is fuelling the need for efficient algorithms for sequence compression and analysis. Data compression and the associated techniques coming from information…

Data Structures and Algorithms · Computer Science 2011-09-05 Heba Afify , Muhammad Islam , Manal Abdel Wahed

DNA Sequence Classification with Compressors

Recent studies in DNA sequence classification have leveraged sophisticated machine learning techniques, achieving notable accuracy in categorizing complex genomic data. Among these, methods such as k-mer counting have proven effective in…

Genomics · Quantitative Biology 2024-01-26 Şükrü Ozan

A Fixed-Length Coding Algorithm for DNA Sequence Compression

While achieving a compression ratio of 2.0 bits/base, the new algorithm codes non-N bases in fixed length. It dramatically reduces the time of coding and decoding than previous DNA compression algorithms and some universal compression…

Information Theory · Computer Science 2007-07-16 Jie Liu , Sheng Bao , Zhiqiang Jing , Shi Chen

MQ-Coder inspired arithmetic coder for synthetic DNA data storage

Over the past years, the ever-growing trend on data storage demand, more specifically for "cold" data (i.e. rarely accessed), has motivated research for alternative systems of data storage. Because of its biochemical characteristics,…

Image and Video Processing · Electrical Eng. & Systems 2023-09-15 Xavier Pic , Melpomeni Dimopoulou , Eva Gil San Antonio , Marc Antonini

Image storage on synthetic DNA using compressive autoencoders and DNA-adapted entropy coders

Over the past years, the ever-growing trend on data storage demand, more specifically for "cold" data (rarely accessed data), has motivated research for alternative systems of data storage. Because of its biochemical characteristics,…

Image and Video Processing · Electrical Eng. & Systems 2023-06-23 Xavier Pic , Eva Gil San Antonio , Melpomeni Dimopoulou , Marc Antonini

Reference Sequence Construction for Relative Compression of Genomes

Relative compression, where a set of similar strings are compressed with respect to a reference string, is a very effective method of compressing DNA datasets containing multiple similar sequences. Relative compression is fast to perform…

Quantitative Methods · Quantitative Biology 2011-06-21 Shanika Kuruppu , Simon Puglisi , Justin Zobel

Compression of structured high-throughput sequencing data

Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to…

Quantitative Methods · Quantitative Biology 2014-03-05 Fabien Campagne , Kevin C. Dorff , Nyasha Chambwe , James T. Robinson , Jill P. Mesirov , Thomas D. Wu

Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning

DNA-based storage is an emerging technology that enables digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced…

Information Theory · Computer Science 2024-03-13 Daniella Bar-Lev , Itai Orr , Omer Sabary , Tuvi Etzion , Eitan Yaakobi

Sublinear Growth of Information in DNA Sequences

We introduce a novel method to analyse complete genomes and recognise some distinctive features by means of an adaptive compression algorithm, which is not DNA-oriented. We study the Information Content as a function of the number of…

Genomics · Quantitative Biology 2007-05-23 Giulia Menconi