English
Related papers

Related papers: Disk-based genome sequencing data compression

200 papers

Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being…

Data Structures and Algorithms · Computer Science 2015-03-20 Anthony J. Cox , Markus J. Bauer , Tobias Jakobi , Giovanna Rosone

Modern biological science produces vast amounts of genomic sequence data. This is fuelling the need for efficient algorithms for sequence compression and analysis. Data compression and the associated techniques coming from information…

Data Structures and Algorithms · Computer Science 2011-09-05 Heba Afify , Muhammad Islam , Manal Abdel Wahed

DNA sequencing technology has advanced to a point where storage is becoming the central bottleneck in the acquisition and mining of more data. Large amounts of data are vital for genomics research, and generic compression tools, while…

Information Theory · Computer Science 2016-11-15 Bobbie Chern , Idoia Ochoa , Alexandros Manolakos , Albert No , Kartik Venkat , Tsachy Weissman

Biological data mainly comprises of Deoxyribonucleic acid (DNA) and protein sequences. These are the biomolecules which are present in all cells of human beings. Due to the self-replicating property of DNA, it is a key constitute of genetic…

Other Quantitative Biology · Quantitative Biology 2020-06-04 Shakeela Bibi , Javed Iqbal , Adnan Iftekhar , Mir Hassan

We live in a period where bio-informatics is rapidly expanding, a significant quantity of genomic data has been produced as a result of the advancement of high-throughput genome sequencing technology, raising concerns about the costs…

Quantitative Methods · Quantitative Biology 2023-03-10 Mehedi Hasan Sarkar , Adnan Ferdous Ashrafi

The fall of prices of the high-throughput genome sequencing changes the landscape of modern genomics. A number of large scale projects aimed at sequencing many human genomes are in progress. Genome sequencing also becomes an important aid…

Data Structures and Algorithms · Computer Science 2017-03-03 Sebastian Deorowicz , Agnieszka Danek , Marcin Niemiec

We present a Compression Tool, "GenBit Compress", for genetic sequences based on our new proposed "GenBit Compress Algorithm". Our Tool achieves the best compression ratios for Entire Genome (DNA sequences) . Significantly better…

Mathematical Software · Computer Science 2010-07-15 P. Raja Rajeswari , Allam Apparo , V. K. Kumar

DNA storage is now being considered as a new archival storage method for its durability and high information density, but still facing some challenges like high costs and low throughput. By reducing sequencing sample size for decoding…

Information Theory · Computer Science 2025-04-22 Ruiying Cao , Xin Chen

Storing and archiving data produced by next-generation sequencing (NGS) is a huge burden for research institutions. Reference-based compression algorithms are effective in dealing with these data. Our work focuses on compressing FASTQ…

Information Theory · Computer Science 2024-04-04 Yuanjian Liu , Huihao Luo , Zhijun Han , Yao Hu , Yehui Yang , Kyle Chard , Sheng Di , Ian Foster , Jiesheng Wu

DNA is a leading candidate as the next archival storage media due to its density, durability and sustainability. To read (and write) data DNA storage exploits technology that has been developed over decades to sequence naturally occurring…

Emerging Technologies · Computer Science 2022-05-12 Jasmine Quah , Omer Sella , Thomas Heinis

Being able to store and transmit human genome sequences is an important part in genomic research and industrial applications. The complete human genome has 3.1 billion base pairs (haploid), and storing the entire genome naively takes about…

Genomics · Quantitative Biology 2020-10-07 Anirduddha Laud , Gaurav Menghani , Madhava Keralapura

Motivation: Next Generation Sequencing technologies revolutionized many fields in biology by enabling the fast and cheap sequencing of large amounts of genomic data. The ever increasing sequencing capacities enabled by current sequencing…

Genomics · Quantitative Biology 2012-07-24 Himanshu Asnani , Dinesh Bharadia , Mainak Chowdhury , Idoia Ochoa , Itai Sharon , Tsachy Weissman

While achieving a compression ratio of 2.0 bits/base, the new algorithm codes non-N bases in fixed length. It dramatically reduces the time of coding and decoding than previous DNA compression algorithms and some universal compression…

Information Theory · Computer Science 2007-07-16 Jie Liu , Sheng Bao , Zhiqiang Jing , Shi Chen

Motivation: Data volumes generated by next-generation sequencing technolo- gies is now a major concern, both for storage and transmission. This triggered the need for more efficient methods than general purpose compression tools, such as…

Data Structures and Algorithms · Computer Science 2014-12-19 Gaëtan Benoit , Claire Lemaitre , Dominique Lavenier , Guillaume Rizk

This article introduces a new DNA sequence compression algorithm which is based on LUT and LZ77 algorithm. Combined a LUT-based pre-coding routine and LZ77 compression routine,this algorithm can approach a compression ratio of 1.9bits…

Information Theory · Computer Science 2007-07-16 Sheng Bao , Shi Chen , Zhiqiang Jing , Ran Ren

We propose a new compression scheme for genomic data given as sequence fragments called reads. The scheme uses a reference genome at the decoder side only, freeing the encoder from the burdens of storing references and performing…

Information Theory · Computer Science 2023-02-10 Yotam Gershon , Yuval Cassuto

Although the expenses associated with DNA sequencing have been rapidly decreasing, the current cost of sequencing information stands at roughly $120/GB, which is dramatically more expensive than reading from existing archival storage…

Discrete Mathematics · Computer Science 2023-11-30 Daniella Bar-Lev , Omer Sabary , Ryan Gabrys , Eitan Yaakobi

DNA-based data storage has been attracting significant attention due to its extremely high data storage density, low power consumption, and long duration compared to conventional data storage media. Despite the recent advancements in DNA…

Information Theory · Computer Science 2024-11-12 Yi Ding , Xuan He , Tuan Thanh Nguyen , Wentu Song , Zohar Yakhini , Eitan Yaakobi , Linqiang Pan , Xiaohu Tang , Kui Cai

Data compression plays an important role to deal with high volumes of DNA sequences in the field of Bioinformatics. Again data compression techniques directly affect the alignment of DNA sequences. So the time needed to decompress a…

Computational Engineering, Finance, and Science · Computer Science 2012-11-13 Subhankar Roy , Sunirmal Khatua , Sudipta Roy , Samir K. Bandyopadhyay

DNA-based storage is an emerging technology that enables digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced…

Information Theory · Computer Science 2024-03-13 Daniella Bar-Lev , Itai Orr , Omer Sabary , Tuvi Etzion , Eitan Yaakobi
‹ Prev 1 2 3 10 Next ›