Related papers: AMGC: Adaptive match-based genomic compression alg…

FastqZip: An Improved Reference-Based Genome Sequence Lossy Compression Framework

Storing and archiving data produced by next-generation sequencing (NGS) is a huge burden for research institutions. Reference-based compression algorithms are effective in dealing with these data. Our work focuses on compressing FASTQ…

Information Theory · Computer Science 2024-04-04 Yuanjian Liu , Huihao Luo , Zhijun Han , Yao Hu , Yehui Yang , Kyle Chard , Sheng Di , Ian Foster , Jiesheng Wu

Lossy Compression of Quality Values via Rate Distortion Theory

Motivation: Next Generation Sequencing technologies revolutionized many fields in biology by enabling the fast and cheap sequencing of large amounts of genomic data. The ever increasing sequencing capacities enabled by current sequencing…

Genomics · Quantitative Biology 2012-07-24 Himanshu Asnani , Dinesh Bharadia , Mainak Chowdhury , Idoia Ochoa , Itai Sharon , Tsachy Weissman

DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

(An updated version of this manuscript has been accepted to Scientific Reports in 2016, please refer to http://www.nature.com/articles/srep31900) The highly anticipated transition from next generation sequencing (NGS) to third generation…

Genomics · Quantitative Biology 2016-09-06 Chengxi Ye , Chris Hill , Shigang Wu , Jue Ruan , Zhanshan , Ma

AMAS: optimizing the partition and filtration of adaptive seeds to speed up read mapping

Background: Identifying all possible mapping locations of next-generation sequencing (NGS) reads is highly essential in several applications such as prediction of genomic variants or protein binding motifs located in repeat regions, isoform…

Genomics · Quantitative Biology 2020-03-25 Ngoc Hieu Tran , Xin Chen

Genomic Compression with Read Alignment at the Decoder

We propose a new compression scheme for genomic data given as sequence fragments called reads. The scheme uses a reference genome at the decoder side only, freeing the encoder from the burdens of storing references and performing…

Information Theory · Computer Science 2023-02-10 Yotam Gershon , Yuval Cassuto

SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis

Genome sequence analysis, which examines the DNA sequences of organisms, drives advances in many critical medical and biotechnological fields. Given its importance and the exponentially growing volumes of genomic sequence data, there are…

Hardware Architecture · Computer Science 2026-01-26 Nika Mansouri Ghiasi , Talu Güloglu , Harun Mustafa , Can Firtina , Konstantina Koliogeorgi , Konstantinos Kanellopoulos , Haiyu Mao , Rakesh Nadig , Mohammad Sadrosadati , Jisung Park , Onur Mutlu

Genbit Compress Tool(GBC): A Java-Based Tool to Compress DNA Sequences and Compute Compression Ratio(bits/base) of Genomes

We present a Compression Tool, "GenBit Compress", for genetic sequences based on our new proposed "GenBit Compress Algorithm". Our Tool achieves the best compression ratios for Entire Genome (DNA sequences) . Significantly better…

Mathematical Software · Computer Science 2010-07-15 P. Raja Rajeswari , Allam Apparo , V. K. Kumar

Engineering Relative Compression of Genomes

Technology progress in DNA sequencing boosts the genomic database growth at faster and faster rate. Compression, accompanied with random access capabilities, is the key to maintain those huge amounts of data. In this paper we present an…

Computational Engineering, Finance, and Science · Computer Science 2011-03-14 Szymon Grabowski , Sebastian Deorowicz

Genetic Sequence compression using Machine Learning and Arithmetic Encoding Decoding Techniques

We live in a period where bio-informatics is rapidly expanding, a significant quantity of genomic data has been produced as a result of the advancement of high-throughput genome sequencing technology, raising concerns about the costs…

Quantitative Methods · Quantitative Biology 2023-03-10 Mehedi Hasan Sarkar , Adnan Ferdous Ashrafi

Retaining Key Information under High Compression Ratios: Query-Guided Compressor for LLMs

The growing popularity of Large Language Models has sparked interest in context compression for Large Language Models (LLMs). However, the performance of previous methods degrades dramatically as compression ratios increase, sometimes even…

Computation and Language · Computer Science 2024-06-18 Zhiwei Cao , Qian Cao , Yu Lu , Ningxin Peng , Luyang Huang , Shanbo Cheng , Jinsong Su

Reference Based Genome Compression

DNA sequencing technology has advanced to a point where storage is becoming the central bottleneck in the acquisition and mining of more data. Large amounts of data are vital for genomics research, and generic compression tools, while…

Information Theory · Computer Science 2016-11-15 Bobbie Chern , Idoia Ochoa , Alexandros Manolakos , Albert No , Kartik Venkat , Tsachy Weissman

Reference Sequence Construction for Relative Compression of Genomes

Relative compression, where a set of similar strings are compressed with respect to a reference string, is a very effective method of compressing DNA datasets containing multiple similar sequences. Relative compression is fast to perform…

Quantitative Methods · Quantitative Biology 2011-06-21 Shanika Kuruppu , Simon Puglisi , Justin Zobel

ASMCap: An Approximate String Matching Accelerator for Genome Sequence Analysis Based on Capacitive Content Addressable Memory

Genome sequence analysis is a powerful tool in medical and scientific research. Considering the inevitable sequencing errors and genetic variations, approximate string matching (ASM) has been adopted in practice for genome sequencing.…

Hardware Architecture · Computer Science 2023-02-16 Hongtao Zhong , Zhonghao Chen , Wenqin Huangfu , Chen Wang , Yixin Xu , Tianyi Wang , Yao Yu , Yongpan Liu , Vijaykrishnan Narayanan , Huazhong Yang , Xueqing Li

An Integrated Genomics Workflow Tool: Simulating Reads, Evaluating Read Alignments, and Optimizing Variant Calling Algorithms

Next-generation sequencing (NGS) is a pivotal technique in genome sequencing due to its high throughput, rapid results, cost-effectiveness, and enhanced accuracy. Its significance extends across various domains, playing a crucial role in…

Genomics · Quantitative Biology 2025-04-28 Fathima Nuzla Ismail , Shanika Amarasoma

Compression of high throughput sequencing data with probabilistic de Bruijn graph

Motivation: Data volumes generated by next-generation sequencing technolo- gies is now a major concern, both for storage and transmission. This triggered the need for more efficient methods than general purpose compression tools, such as…

Data Structures and Algorithms · Computer Science 2014-12-19 Gaëtan Benoit , Claire Lemaitre , Dominique Lavenier , Guillaume Rizk

A Survey on GAN Acceleration Using Memory Compression Technique

Since its invention, Generative adversarial networks (GANs) have shown outstanding results in many applications. Generative Adversarial Networks are powerful yet, resource-hungry deep-learning models. Their main difference from ordinary…

Machine Learning · Computer Science 2021-08-17 Dina Tantawy , Mohamed Zahran , Amr Wassal

DNA Lossless Differential Compression Algorithm based on Similarity of Genomic Sequence Database

Modern biological science produces vast amounts of genomic sequence data. This is fuelling the need for efficient algorithms for sequence compression and analysis. Data compression and the associated techniques coming from information…

Data Structures and Algorithms · Computer Science 2011-09-05 Heba Afify , Muhammad Islam , Manal Abdel Wahed

Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform

Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being…

Data Structures and Algorithms · Computer Science 2015-03-20 Anthony J. Cox , Markus J. Bauer , Tobias Jakobi , Giovanna Rosone

Alignment-Free Sequence Analysis and Applications

Genome and metagenome comparisons based on large amounts of next-generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads.…

Quantitative Methods · Quantitative Biology 2018-03-28 Jie Ren , Xin Bai , Yang Young Lu , Kujin Tang , Ying Wang , Gesine Reinert , Fengzhu Sun

GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis

Genome sequence analysis has enabled significant advancements in medical and scientific areas such as personalized medicine, outbreak tracing, and the understanding of evolution. Unfortunately, it is currently bottlenecked by the…

Hardware Architecture · Computer Science 2020-09-17 Damla Senol Cali , Gurpreet S. Kalsi , Zülal Bingöl , Can Firtina , Lavanya Subramanian , Jeremie S. Kim , Rachata Ausavarungnirun , Mohammed Alser , Juan Gomez-Luna , Amirali Boroumand , Anant Nori , Allison Scibisz , Sreenivas Subramoney , Can Alkan , Saugata Ghose , Onur Mutlu