Related papers: A DNA Sequence Compression Algorithm Based on LUT …

A Fixed-Length Coding Algorithm for DNA Sequence Compression

While achieving a compression ratio of 2.0 bits/base, the new algorithm codes non-N bases in fixed length. It dramatically reduces the time of coding and decoding than previous DNA compression algorithms and some universal compression…

Information Theory · Computer Science 2007-07-16 Jie Liu , Sheng Bao , Zhiqiang Jing , Shi Chen

Engineering Relative Compression of Genomes

Technology progress in DNA sequencing boosts the genomic database growth at faster and faster rate. Compression, accompanied with random access capabilities, is the key to maintain those huge amounts of data. In this paper we present an…

Computational Engineering, Finance, and Science · Computer Science 2011-03-14 Szymon Grabowski , Sebastian Deorowicz

An Efficient Biological Sequence Compression Technique Using LUT And Repeat In The Sequence

Data compression plays an important role to deal with high volumes of DNA sequences in the field of Bioinformatics. Again data compression techniques directly affect the alignment of DNA sequences. So the time needed to decompress a…

Computational Engineering, Finance, and Science · Computer Science 2012-11-13 Subhankar Roy , Sunirmal Khatua , Sudipta Roy , Samir K. Bandyopadhyay

Genbit Compress Tool(GBC): A Java-Based Tool to Compress DNA Sequences and Compute Compression Ratio(bits/base) of Genomes

We present a Compression Tool, "GenBit Compress", for genetic sequences based on our new proposed "GenBit Compress Algorithm". Our Tool achieves the best compression ratios for Entire Genome (DNA sequences) . Significantly better…

Mathematical Software · Computer Science 2010-07-15 P. Raja Rajeswari , Allam Apparo , V. K. Kumar

DNA Lossless Differential Compression Algorithm based on Similarity of Genomic Sequence Database

Modern biological science produces vast amounts of genomic sequence data. This is fuelling the need for efficient algorithms for sequence compression and analysis. Data compression and the associated techniques coming from information…

Data Structures and Algorithms · Computer Science 2011-09-05 Heba Afify , Muhammad Islam , Manal Abdel Wahed

Reference Based Genome Compression

DNA sequencing technology has advanced to a point where storage is becoming the central bottleneck in the acquisition and mining of more data. Large amounts of data are vital for genomics research, and generic compression tools, while…

Information Theory · Computer Science 2016-11-15 Bobbie Chern , Idoia Ochoa , Alexandros Manolakos , Albert No , Kartik Venkat , Tsachy Weissman

Genetic Sequence compression using Machine Learning and Arithmetic Encoding Decoding Techniques

We live in a period where bio-informatics is rapidly expanding, a significant quantity of genomic data has been produced as a result of the advancement of high-throughput genome sequencing technology, raising concerns about the costs…

Quantitative Methods · Quantitative Biology 2023-03-10 Mehedi Hasan Sarkar , Adnan Ferdous Ashrafi

Disk-based genome sequencing data compression

Motivation: High-coverage sequencing data have significant, yet hard to exploit, redundancy. Most FASTQ compressors cannot efficiently compress the DNA stream of large datasets, since the redundancy between overlapping reads cannot be…

Data Structures and Algorithms · Computer Science 2014-09-19 Szymon Grabowski , Sebastian Deorowicz , Łukasz Roguski

DNA-Based Storage: Trends and Methods

We provide an overview of current approaches to DNA-based storage system design and accompanying synthesis, sequencing and editing methods. We also introduce and analyze a suite of new constrained coding schemes for both archival and random…

Emerging Technologies · Computer Science 2015-07-08 S. M. Hossein Tabatabaei Yazdi , Han Mao Kiah , Eva Ruiz Garcia , Jian Ma , Huimin Zhao , Olgica Milenkovic

Analysis of Compression Techniques for DNA Sequence Data

Biological data mainly comprises of Deoxyribonucleic acid (DNA) and protein sequences. These are the biomolecules which are present in all cells of human beings. Due to the self-replicating property of DNA, it is a key constitute of genetic…

Other Quantitative Biology · Quantitative Biology 2020-06-04 Shakeela Bibi , Javed Iqbal , Adnan Iftekhar , Mir Hassan

Coding for Optimized Writing Rate in DNA Storage

A method for encoding information in DNA sequences is described. The method is based on the precision-resolution framework, and is aimed to work in conjunction with a recently suggested terminator-free template independent DNA synthesis…

Information Theory · Computer Science 2020-05-14 Siddharth Jain , Farzad Farnoud , Moshe Schwartz , Jehoshua Bruck

Compression of high throughput sequencing data with probabilistic de Bruijn graph

Motivation: Data volumes generated by next-generation sequencing technolo- gies is now a major concern, both for storage and transmission. This triggered the need for more efficient methods than general purpose compression tools, such as…

Data Structures and Algorithms · Computer Science 2014-12-19 Gaëtan Benoit , Claire Lemaitre , Dominique Lavenier , Guillaume Rizk

DNA Sequence Classification with Compressors

Recent studies in DNA sequence classification have leveraged sophisticated machine learning techniques, achieving notable accuracy in categorizing complex genomic data. Among these, methods such as k-mer counting have proven effective in…

Genomics · Quantitative Biology 2024-01-26 Şükrü Ozan

A biologically constrained encoding solution for long-term storage of images onto synthetic DNA

Living in the age of the digital media explosion, the amount of data that is being stored increases dramatically. However, even if existing storage systems suggest efficiency in capacity, they are lacking in durability. Hard disks, flash,…

Image and Video Processing · Electrical Eng. & Systems 2019-04-08 Melpomeni Dimopoulou , Marc Antonini , Pascal Barbry , Raja Appuswamy

Reference Sequence Construction for Relative Compression of Genomes

Relative compression, where a set of similar strings are compressed with respect to a reference string, is a very effective method of compressing DNA datasets containing multiple similar sequences. Relative compression is fast to perform…

Quantitative Methods · Quantitative Biology 2011-06-21 Shanika Kuruppu , Simon Puglisi , Justin Zobel

FastqZip: An Improved Reference-Based Genome Sequence Lossy Compression Framework

Storing and archiving data produced by next-generation sequencing (NGS) is a huge burden for research institutions. Reference-based compression algorithms are effective in dealing with these data. Our work focuses on compressing FASTQ…

Information Theory · Computer Science 2024-04-04 Yuanjian Liu , Huihao Luo , Zhijun Han , Yao Hu , Yehui Yang , Kyle Chard , Sheng Di , Ian Foster , Jiesheng Wu

Fast low-level pattern matching algorithm

This paper focuses on pattern matching in the DNA sequence. It was inspired by a previously reported method that proposes encoding both pattern and sequence using prime numbers. Although fast, the method is limited to rather small pattern…

Computer Vision and Pattern Recognition · Computer Science 2016-11-21 Janja Paliska Soldo , Ana Sovic Krzic , and Damir Sersic

A Compressed Self-Index for Genomic Databases

Advances in DNA sequencing technology will soon result in databases of thousands of genomes. Within a species, individuals' genomes are almost exact copies of each other; e.g., any two human genomes are 99.9% the same. Relative Lempel-Ziv…

Data Structures and Algorithms · Computer Science 2011-11-08 Travis Gagie , Juha Kärkkäinen , Yakov Nekrich , Simon J. Puglisi

Optimizing Sequencing Coverage Depth in DNA Storage: Insights From DNA Storage Data

DNA storage is now being considered as a new archival storage method for its durability and high information density, but still facing some challenges like high costs and low throughput. By reducing sequencing sample size for decoding…

Information Theory · Computer Science 2025-04-22 Ruiying Cao , Xin Chen

Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform

Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being…

Data Structures and Algorithms · Computer Science 2015-03-20 Anthony J. Cox , Markus J. Bauer , Tobias Jakobi , Giovanna Rosone