Related papers: An Improved Filtering Algorithm for Big Read Datas…

Read classification using semi-supervised deep learning

In this paper, we propose a semi-supervised deep learning method for detecting the specific types of reads that impede the de novo genome assembly process. Instead of dealing directly with sequenced reads, we analyze their coverage graphs…

Machine Learning · Computer Science 2019-04-24 Tomislav Šebrek , Jan Tomljanović , Josip Krapac , Mile Šikić

A read-filtering algorithm for high-throughput marker-gene studies that greatly improves OTU accuracy

Adequate read filtering is critical when processing high-throughput data in marker-gene-based studies. Sequencing errors can cause the mis-clustering of otherwise similar reads, artificially increasing the number of retrieved Operational…

Quantitative Methods · Quantitative Biology 2015-06-02 Fernando Puente-Sánchez , Jacobo Aguirre , Víctor Parro

META$^\mathbf{2}$: Memory-efficient taxonomic classification and abundance estimation for metagenomics with deep learning

Metagenomic studies have increasingly utilized sequencing technologies in order to analyze DNA fragments found in environmental samples.One important step in this analysis is the taxonomic classification of the DNA fragments. Conventional…

Genomics · Quantitative Biology 2020-02-11 Andreas Georgiou , Vincent Fortuin , Harun Mustafa , Gunnar Rätsch

ReadsClean: a new approach to error correction of sequencing reads based on alignments clustering

Motivation: Next generation methods of DNA sequencing produce relatively high rate of reading errors, which interfere with de novo genome assembly of newly sequenced organisms and particularly affect the quality of SNP detection important…

Genomics · Quantitative Biology 2019-07-31 Oleg Fokin , Anastasia Bakulina , Igor Seledtsov , Victor Solovyev

Large-scale Machine Learning for Metagenomics Sequence Classification

Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is…

Quantitative Methods · Quantitative Biology 2015-05-27 Kévin Vervier , Pierre Mahé , Maud Tournoud , Jean-Baptiste Veyrieras , Jean-Philippe Vert

Single-Read Reconstruction for DNA Data Storage Using Transformers

As the global need for large-scale data storage is rising exponentially, existing storage technologies are approaching their theoretical and functional limits in terms of density and energy consumption, making DNA based storage a potential…

Emerging Technologies · Computer Science 2021-10-12 Yotam Nahum , Eyar Ben-Tolila , Leon Anavy

GRIM-filter: fast seed filtering in read mapping using emerging memory technologies

Motivation: Seed filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. Read mappers 1) quickly…

Genomics · Quantitative Biology 2017-08-16 Jeremie S Kim , Damla Senol , Hongyi Xin , Donghyuk Lee , Saugata Ghose , Mohammed Alser , Hasan Hassan , Oguz Ergin , Can Alkan , Onur Mutlu

GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies

Motivation: Seed location filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. State-of-the-art read…

Genomics · Quantitative Biology 2020-04-21 Jeremie S. Kim , Damla Senol Cali , Hongyi Xin , Donghyuk Lee , Saugata Ghose , Mohammed Alser , Hasan Hassan , Oguz Ergin , Can Alkan , Onur Mutlu

Assembly of repetitive regions using next-generation sequencing data

High read depth can be used to assemble short sequence repeats. The existing genome assemblers fail in repetitive regions of longer than average read. I propose a new algorithm for a DNA assembly which uses the relative frequency of reads…

Genomics · Quantitative Biology 2015-01-08 Robert M. Nowak

Robust Multi-Read Reconstruction from Contaminated Clusters Using Deep Neural Network for DNA Storage

DNA has immense potential as an emerging data storage medium. The principle of DNA storage is the conversion and flow of digital information between binary code stream, quaternary base, and actual DNA fragments. This process will inevitably…

Information Retrieval · Computer Science 2022-10-21 Yun Qin , Fei Zhu , Bo Xi

featureCounts: An efficient general-purpose program for assigning sequence reads to genomic features

Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to…

Genomics · Quantitative Biology 2016-07-26 Yang Liao , Gordon K Smyth , Wei Shi

Do Read Errors Matter for Genome Assembly?

While most current high-throughput DNA sequencing technologies generate short reads with low error rates, emerging sequencing technologies generate long reads with high error rates. A basic question of interest is the tradeoff between read…

Information Theory · Computer Science 2015-01-27 Ilan Shomorony , Thomas Courtade , David Tse

ProDOMA: improve PROtein DOMAin classification for third-generation sequencing reads using deep learning

Motivation: With the development of third-generation sequencing technologies, people are able to obtain DNA sequences with lengths from 10s to 100s of kb. These long reads allow protein domain annotation without assembly, thus can produce…

Genomics · Quantitative Biology 2021-07-09 Du Nan , Jiayu Shang , Yanni Sun

Rapid Sequence Identification of Potential Pathogens Using Techniques from Sparse Linear Algebra

The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck.…

Quantitative Methods · Quantitative Biology 2017-04-13 Stephanie Dodson , Darrell O. Ricke , Jeremy Kepner , Nelson Chiu , Anna Shcherbina

Assembling large, complex environmental metagenomes

The large volumes of sequencing data required to sample complex environments deeply pose new challenges to sequence analysis approaches. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires…

Genomics · Quantitative Biology 2013-01-01 Adina Chuang Howe , Janet Jansson , Stephanie A. Malfatti , Susannah G. Tringe , James M. Tiedje , C. Titus Brown

SPRISS: Approximating Frequent $k$-mers by Sampling Reads, and Applications

The extraction of $k$-mers is a fundamental component in many complex analyses of large next-generation sequencing datasets, including reads classification in genomics and the characterization of RNA-seq datasets. The extraction of all…

Quantitative Methods · Quantitative Biology 2021-01-19 Diego Santoro , Leonardo Pellegrina , Fabio Vandin

Kernel Density Estimation by Genetic Algorithm

This study proposes a data condensation method for multivariate kernel density estimation by genetic algorithm. First, our proposed algorithm generates multiple subsamples of a given size with replacement from the original sample. The…

Methodology · Statistics 2022-03-04 Kiheiji Nishida

DsDm: Model-Aware Dataset Selection with Datamodels

When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that intuitively should improve model behavior.…

Machine Learning · Computer Science 2024-01-24 Logan Engstrom , Axel Feldmann , Aleksander Madry

Accelerating Deep Learning with Dynamic Data Pruning

Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing…

Machine Learning · Computer Science 2021-11-25 Ravi S Raju , Kyle Daruwalla , Mikko Lipasti

Sequence assembly from corrupted shotgun reads

The prevalent technique for DNA sequencing consists of two main steps: shotgun sequencing, where many randomly located fragments, called reads, are extracted from the overall sequence, followed by an assembly algorithm that aims to…

Genomics · Quantitative Biology 2016-01-28 Shirshendu Ganguly , Elchanan Mossel , Miklos Z. Racz