Related papers: SPATA: A Seeding and Patching Algorithm for Hybrid…
Motivation: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining the sequences for a large number of genes from an organism with no reference genome. With the rapidly increasing throughputs and…
RNA-seq allows detection and precise quantification of transcripts, provides comprehensive understanding of exon/intron boundaries, aids discovery of alternatively spliced isoforms and fusion transcripts along with measurement of…
The high-throughput short-reads RNA-seq protocols often produce paired-end reads, with the middle portion of the fragments being unsequenced. We explore if the full-length fragments can be computationally reconstructed from the sequenced…
RNA-Seq technology offers new high-throughput ways for transcript identification and quantification based on short reads, and has recently attracted great interest. The problem is usually modeled by a weighted splicing graph whose nodes…
De novo genome assembly is the process of stitching short DNA sequences to generate longer DNA sequences, without using any reference sequence for alignment. It enables high-throughput genome sequencing and thus accelerates the discovery of…
De novo genome assembly is challenging in highly repetitive regions; however, reference-guided assemblers often suffer from bias. We propose a framework for pangenome-guided sequence assembly, which can resolve short-read data in complex…
The study of functional genomics--particularly in non-model organisms has been dramatically improved over the last few years by use of transcriptomes and RNAseq. While these studies are potentially extremely powerful, a computationally…
Recent advances in high-throughput cDNA sequencing (RNA-Seq) technology have revolutionized transcriptome studies. A major motivation for RNA-Seq is to map the structure of expressed transcripts at nucleotide resolution. With accurate…
Motivation: De novo transcriptome assembly of non-model organisms is the first major step for many RNA-seq analysis tasks. Current methods for de novo assembly often report a large number of contiguous sequences (contigs), which may be…
Identification and quantification of condition-specific transcripts using RNA-Seq is vital in transcriptomics research. While initial efforts using mathematical or statistical modeling of read counts or per-base exonic signal have been…
Motivation: RNA-seq has made feasible the analysis of a whole set of expressed mRNAs. Mapping-based assembly of RNA-seq reads sometimes is infeasible due to lack of high-quality references. However, de novo assembly is very challenging due…
Genome assembly from the high-throughput sequencing (HTS) reads is a fundamental yet challenging computational problem. An intrinsic challenge is the uncertainty caused by the widespread repetitive elements. Here we get around the…
The de novo assembly of large, complex genomes is a significant challenge with currently available DNA sequencing technology. While many de novo assembly software packages are available, comparatively little attention has been paid to…
Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to…
Genome assembly using high throughput data with short reads, arguably, remains an unresolvable task in repetitive genomes, since when the length of a repeat exceeds the read length, it becomes difficult to unambiguously connect the flanking…
Whole transcriptome sequencing is increasingly being used as a functional genomics tool to study non- model organisms. However, when the reference transcriptome used to calculate differential expression is incomplete, significant error in…
High read depth can be used to assemble short sequence repeats. The existing genome assemblers fail in repetitive regions of longer than average read. I propose a new algorithm for a DNA assembly which uses the relative frequency of reads…
High-throughput cDNA sequencing (RNA-seq) is a very powerful technique to quantify gene expression in an unbiased way. The Crustacean family is among the groups of organisms sparsely represented in current genomic databases. Here we present…
Motivation: New long read sequencers promise to transform sequencing and genome assembly by producing reads tens of kilobases long. However their high error rate significantly complicates assembly and requires expensive correction steps to…
The main challenge in de novo assembly of NGS data is certainly to deal with repeats that are longer than the reads. This is particularly true for RNA- seq data, since coverage information cannot be used to flag repeated sequences, of which…