English

Read classification using semi-supervised deep learning

Machine Learning 2019-04-24 v1 Genomics Machine Learning

Abstract

In this paper, we propose a semi-supervised deep learning method for detecting the specific types of reads that impede the de novo genome assembly process. Instead of dealing directly with sequenced reads, we analyze their coverage graphs converted to 1D-signals. We noticed that specific signal patterns occur in each relevant class of reads. Semi-supervised approach is chosen because manually labelling the data is a very slow and tedious process, so our goal was to facilitate the assembly process with as little labeled data as possible. We tested two models to learn patterns in the coverage graphs: M1+M2 and semi-GAN. We evaluated the performance of each model based on a manually labeled dataset that comprises various reads from multiple reference genomes with respect to the number of labeled examples that were used during the training process. In addition, we embedded our detection in the assembly process which improved the quality of assemblies.

Keywords

Cite

@article{arxiv.1904.10353,
  title  = {Read classification using semi-supervised deep learning},
  author = {Tomislav Šebrek and Jan Tomljanović and Josip Krapac and Mile Šikić},
  journal= {arXiv preprint arXiv:1904.10353},
  year   = {2019}
}

Comments

2nd International Workshop on Deep Learning for Precision Medicine, ECML-PKDD, 2017, Skopje, Nothern Macedonia