English
Related papers

Related papers: An Improved Filtering Algorithm for Big Read Datas…

200 papers

In this paper, we propose a semi-supervised deep learning method for detecting the specific types of reads that impede the de novo genome assembly process. Instead of dealing directly with sequenced reads, we analyze their coverage graphs…

Machine Learning · Computer Science 2019-04-24 Tomislav Šebrek , Jan Tomljanović , Josip Krapac , Mile Šikić

Adequate read filtering is critical when processing high-throughput data in marker-gene-based studies. Sequencing errors can cause the mis-clustering of otherwise similar reads, artificially increasing the number of retrieved Operational…

Quantitative Methods · Quantitative Biology 2015-06-02 Fernando Puente-Sánchez , Jacobo Aguirre , Víctor Parro

Metagenomic studies have increasingly utilized sequencing technologies in order to analyze DNA fragments found in environmental samples.One important step in this analysis is the taxonomic classification of the DNA fragments. Conventional…

Genomics · Quantitative Biology 2020-02-11 Andreas Georgiou , Vincent Fortuin , Harun Mustafa , Gunnar Rätsch

Motivation: Next generation methods of DNA sequencing produce relatively high rate of reading errors, which interfere with de novo genome assembly of newly sequenced organisms and particularly affect the quality of SNP detection important…

Genomics · Quantitative Biology 2019-07-31 Oleg Fokin , Anastasia Bakulina , Igor Seledtsov , Victor Solovyev

Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is…

Quantitative Methods · Quantitative Biology 2015-05-27 Kévin Vervier , Pierre Mahé , Maud Tournoud , Jean-Baptiste Veyrieras , Jean-Philippe Vert

As the global need for large-scale data storage is rising exponentially, existing storage technologies are approaching their theoretical and functional limits in terms of density and energy consumption, making DNA based storage a potential…

Emerging Technologies · Computer Science 2021-10-12 Yotam Nahum , Eyar Ben-Tolila , Leon Anavy

Motivation: Seed filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. Read mappers 1) quickly…

Motivation: Seed location filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. State-of-the-art read…

High read depth can be used to assemble short sequence repeats. The existing genome assemblers fail in repetitive regions of longer than average read. I propose a new algorithm for a DNA assembly which uses the relative frequency of reads…

Genomics · Quantitative Biology 2015-01-08 Robert M. Nowak

DNA has immense potential as an emerging data storage medium. The principle of DNA storage is the conversion and flow of digital information between binary code stream, quaternary base, and actual DNA fragments. This process will inevitably…

Information Retrieval · Computer Science 2022-10-21 Yun Qin , Fei Zhu , Bo Xi

Next-generation sequencing technologies generate millions of short sequence reads, which are usually aligned to a reference genome. In many applications, the key information required for downstream analysis is the number of reads mapping to…

Genomics · Quantitative Biology 2016-07-26 Yang Liao , Gordon K Smyth , Wei Shi

While most current high-throughput DNA sequencing technologies generate short reads with low error rates, emerging sequencing technologies generate long reads with high error rates. A basic question of interest is the tradeoff between read…

Information Theory · Computer Science 2015-01-27 Ilan Shomorony , Thomas Courtade , David Tse

Motivation: With the development of third-generation sequencing technologies, people are able to obtain DNA sequences with lengths from 10s to 100s of kb. These long reads allow protein domain annotation without assembly, thus can produce…

Genomics · Quantitative Biology 2021-07-09 Du Nan , Jiayu Shang , Yanni Sun

The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck.…

Quantitative Methods · Quantitative Biology 2017-04-13 Stephanie Dodson , Darrell O. Ricke , Jeremy Kepner , Nelson Chiu , Anna Shcherbina

The large volumes of sequencing data required to sample complex environments deeply pose new challenges to sequence analysis approaches. De novo metagenomic assembly effectively reduces the total amount of data to be analyzed but requires…

The extraction of $k$-mers is a fundamental component in many complex analyses of large next-generation sequencing datasets, including reads classification in genomics and the characterization of RNA-seq datasets. The extraction of all…

Quantitative Methods · Quantitative Biology 2021-01-19 Diego Santoro , Leonardo Pellegrina , Fabio Vandin

This study proposes a data condensation method for multivariate kernel density estimation by genetic algorithm. First, our proposed algorithm generates multiple subsamples of a given size with replacement from the original sample. The…

Methodology · Statistics 2022-03-04 Kiheiji Nishida

When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that intuitively should improve model behavior.…

Machine Learning · Computer Science 2024-01-24 Logan Engstrom , Axel Feldmann , Aleksander Madry

Deep learning's success has been attributed to the training of large, overparameterized models on massive amounts of data. As this trend continues, model training has become prohibitively costly, requiring access to powerful computing…

Machine Learning · Computer Science 2021-11-25 Ravi S Raju , Kyle Daruwalla , Mikko Lipasti

The prevalent technique for DNA sequencing consists of two main steps: shotgun sequencing, where many randomly located fragments, called reads, are extracted from the overall sequence, followed by an assembly algorithm that aims to…

Genomics · Quantitative Biology 2016-01-28 Shirshendu Ganguly , Elchanan Mossel , Miklos Z. Racz
‹ Prev 1 2 3 10 Next ›